Non-invasive determination of tissue source of cell-free dna

ABSTRACT

Systems, methods, and apparatuses can determine and use methylation profiles of various tissues and samples. Examples are provided. A methylation profile can be deduced for fetal/tumor tissue based on a comparison of plasma methylation (or other sample with cell-free DNA) to a methylation profile of the mother/patient. A methylation profile can be determined for fetal/tumor tissue using tissue-specific alleles to identify DNA from the fetus/tumor when the sample has a mixture of DNA. A methylation profile can be used to determine copy number variations in genome of a fetus/tumor. Methylation markers for a fetus have been identified via various techniques. The methylation profile can be determined by determining a size parameter of a size distribution of DNA fragments, where reference values for the size parameter can be used to determine methylation levels. Additionally, a methylation level can be used to determine a level of cancer.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a divisional of U.S. patent application Ser. No.15/647,824, entitled “NON-INVASIVE DETERMINATION OF METHYLOME OF FETUSOR TUMOR FROM PLASMA,” filed on Jul. 12, 2017, which is a continuationof U.S. patent application Ser. No. 13/842,209, entitled “Non-InvasiveDetermination Of Methylome Of Fetus Or Tumor From Plasma,” filed on Mar.15, 2013, now U.S. Pat. No. 9,732,390, which is a non-provisional of andclaims the benefit of U.S. Provisional Patent Application No.61/703,512, entitled “Method Of Determining The Whole Genome DNAMethylation Status Of The Placenta By Massively Parallel Sequencing OfMaternal Plasma,” filed on Sep. 20, 2012, each of which are hereinincorporated by reference in their entirety for all purposes.

FIELD

The present disclosure relates generally a determination of amethylation pattern (methylome) of DNA, and more particularly toanalyzing a biological sample (e.g., plasma) that includes a mixture ofDNA from different genomes (e.g., from fetus and mother, or from tumorand normal cells) to determine the methylation pattern (methylome) ofthe minority genome. Uses of the determined methylome are alsodescribed.

BACKGROUND

Embryonic and fetal development is a complex process and involves aseries of highly orchestrated genetic and epigenetic events. Cancerdevelopment is also a complex process involving typically multiplegenetic and epigenetic steps. Abnormalities in the epigenetic control ofdevelopmental processes are implicated in infertility, spontaneousabortion, intrauterine growth abnormalities and postnatal consequences.DNA methylation is one of the most frequently studied epigeneticmechanisms. Methylation of DNA mostly occurs in the context of theaddition of a methyl group to the 5′ carbon of cytosine residues amongCpG dinucleotides. Cytosine methylation adds a layer of control to genetranscription and DNA function. For example, hypermethylation of genepromoters enriched with CpG dinucleotides, termed CpG islands, istypically associated with repression of gene function.

Despite the important role of epigenetic mechanisms in mediatingdevelopmental processes, human embryonic and fetal tissues are notreadily accessible for analysis (tumors may similarly not beaccessible). Studies of the dynamic changes of such epigenetic processesin health and disease during the prenatal period in humans are virtuallyimpossible. Extraembryonic tissues, particularly the placenta, which canbe obtained as part of prenatal diagnostic procedures or after birth,have provided one of the main avenues for such investigations. However,such tissues require invasive procedures.

The DNA methylation profile of the human placenta has intriguedresearchers for decades. The human placenta exhibits a plethora ofpeculiar physiological features involving DNA methylation. On a globallevel, placental tissues are hypomethylated when compared with mostsomatic tissues. At the gene level, the methylation status of selectedgenomic loci is a specific signature of placental tissues. Both theglobal and locus-specific methylation profiles show gestational-agedependent changes. Imprinted genes, namely genes for which expression isdependent on the parental origin of alleles serve key functions in theplacenta. The placenta has been described as pseudomalignant andhypermethylation of several tumor suppressor genes have been observed.

Studies of the DNA methylation profile of placental tissues haveprovided insights into the pathophysiology of pregnancy-associated ordevelopmentally-related diseases, such as preeclampsia and intrauterinegrowth restriction. Disorders in genomic imprinting are associated withdevelopmental disorders, such as Prader-Willi syndrome and Angelmansyndrome. Altered profiles of genomic imprinting and global DNAmethylation in placental and fetal tissues have been observed inpregnancies resulting from assisted reproductive techniques (H. Hiura etal. 2012 Hum Reprod; 27: 2541-2548). A number of environmental factorssuch as maternal smoking (K. E. Haworth et al. 2013 Epigenomics; 5:37-49), maternal dietary factors (X. Jiang et al. 2012 FASEB J; 26:3563-3574) and maternal metabolic status such as diabetes (N. Hajj etal., Diabetes. doi: 10.2337/db12-0289) have been associated withepigenetic aberrations of the offsprings.

Despite decades of efforts, there had not been any practical meansavailable to study the fetal or tumor methylome and to monitor thedynamic changes throughout pregnancy or during disease processes, suchas malignancies. Therefore, it is desirable to provide methods foranalyzing all or portions of a fetal methylome and a tumor methylomenoninvasively.

SUMMARY

Embodiments provide systems, methods, and apparatuses for determiningand using methylation profiles of various tissues and samples. Examplesare provided. A methylation profile can be deduced for fetal/tumortissue based on a comparison of plasma methylation (or other sample withcell-free DNA) to a methylation profile of the mother/patient. Amethylation profile can be determined for fetal/tumor tissue usingtissue-specific alleles to identify DNA from the fetus/tumor when thesample has a mixture of DNA. A methylation profile can be used todetermine copy number variations in genome of a fetus/tumor. Methylationmarkers for a fetus have been identified via various techniques. Themethylation profile can be determined by determining a size parameter ofa size distribution of DNA fragments, where reference values for thesize parameter can be used to determine methylation levels.

Additionally, a methylation level can be used to determine a level ofcancer. In the context of cancer, the measurement of the methylomicchanges in plasma can allow one to detect the cancer (e.g. for screeningpurposes), for monitoring (e.g. to detect response following anti-cancertreatment; and to detect cancer relapse) and for prognostication (e.g.for measuring the load of cancer cells in the body or for stagingpurposes or for assessing the chance of death from disease or diseaseprogression).

A better understanding of the nature and advantages of embodiments ofthe present invention may be gained with reference to the followingdetailed description and the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1A shows a table 100 of sequencing results for maternal blood,placenta, and maternal plasma according to embodiments of the presentinvention.

FIG. 1B shows methylation density in 1-Mb windows of sequenced samplesaccording to embodiments of the present invention.

FIGS. 2A-2C show plots of the beta-values against the methylationindices: (A) Maternal blood cells, (B) Chorionic villus sample, (C) Termplacental tissue.

FIGS. 3A and 3B show bar charts of percentage of methylated CpG sites inplasma and blood cells collected from an adult male and a non-pregnantadult female: (A) Autosomes, (B) Chromosome X.

FIGS. 4A and 4B show plots of methylation densities of correspondingloci in blood cell DNA and plasma DNA: (A) Non-pregnant adult female,(B) Adult male.

FIGS. 5A and 5B show bar charts of percentage of methylated CpG sitesamong samples collected from the pregnancy: (A). Autosomes, (B)Chromosome X.

FIG. 6 shows a bar chart of methylation level of different repeatclasses of the human genome for maternal blood, placenta and maternalplasma.

FIG. 7A shows a Circos plot 700 for first trimester samples. FIG. 7Bshows a Circos plot 750 for third trimester samples.

FIGS. 8A-8D show plots of comparisons of the methylation densities ofgenomic tissue DNA against maternal plasma DNA for CpG sites surroundingthe informative single nucleotide polymorphisms.

FIG. 9 is a flowchart illustrating a method 900 for determining a firstmethylation profile from a biological sample of an organism according toembodiments of the present invention.

FIG. 10 is a flowchart illustrating a method 1000 of determining a firstmethylation profile from a biological sample of an organism according toembodiments of the present invention.

FIGS. 11A and 11B show graphs of the performance of the predictingalgorithm using maternal plasma data and fractional fetal DNAconcentration according to embodiments of the present invention.

FIG. 12A is a table 1200 showing details of 15 selected genomic loci formethylation prediction according to embodiments of the presentinvention. FIG. 12B is a graph 1250 showing the deduced categories ofthe 15 selected genomic loci are provided with their correspondingmethylation levels in the placenta.

FIG. 13 is a flowchart of a method 1300 for detecting a fetalchromosomal abnormality from a biological sample of a female subjectpregnant with at least one fetus.

FIG. 14 is a flowchart of a method 1400 for identifying methylationmarkers by comparing a placental methylation profile to a maternalmethylation profile according to embodiments of the present invention.

FIG. 15A is a table 1500 showing a performance of first trimesterdifferentially methylated region (DMR) identification algorithm usingplacental methylome with reference to 33 previously reported firsttrimester markers. FIG. 15B is a table 1550 showing a performance ofthird trimester DMR identification algorithm using the placentalmethylome measured using the placenta sample obtained at delivery.

FIG. 16 is a table 1600 showing the numbers of loci predicted to behypermethylated or hypomethylated based on direct analysis of thematernal plasma bisulfite-sequencing data.

FIG. 17A is a plot 1700 showing size distribution of maternal plasma,non-pregnant female control plasma, placental and peripheral blood DNA.FIG. 17B is a plot 1750 of size distribution and methylation profile ofmaternal plasma, adult female control plasma, placental tissue and adultfemale control blood.

FIGS. 18A and 18B are plots of methylation densities and size of plasmaDNA molecules according to embodiments of the present invention.

FIG. 19A shows a plot 1900 of methylation densities and the sizes ofsequenced reads for an adult non-pregnant female. FIG. 19B is a plot1950 showing size distribution and methylation profile of fetal-specificand maternal-specific DNA molecules in maternal plasma.

FIG. 20 is a flowchart of a method 2000 for estimating a methylationlevel of DNA in a biological sample of an organism according toembodiments of the present invention.

FIG. 21A is a table 2100 showing the methylation densities of thepre-operative plasma and the tissue samples of an HCC patient. FIG. 21Bis a table 2150 showing the number of sequence reads and the sequencingdepth achieved per sample.

FIG. 22 is a table 220 showing the methylation densities in theautosomes ranged from 71.2% to 72.5% in the plasma samples of thehealthy controls.

FIGS. 23A and 23B shows methylation density of buffy coat, tumor tissue,non-tumoral liver tissue, the pre-operative plasma and post-operativeplasma of the HCC patient.

FIG. 24A is a plot 2400 showing the methylation densities of thepre-operative plasma from the HCC patient. FIG. 24B is a plot 2450showing the methylation densities of the post-operative plasma from theHCC patient.

FIGS. 25A and 25B show z-scores of the plasma DNA methylation densitiesfor the pre-operative (plot 2500) and post-operative (plot 2550) plasmasamples of the HCC patient using the plasma methylome data of the fourhealthy control subjects as reference for chromosome 1.

FIG. 26A is a table 2600 showing data for z-scores for pre-operative andpost-operative plasma. FIG. 26B is a Circos plot 2620 showing thez-score of the plasma DNA methylation densities for the pre-operativeand post-operative plasma samples of the HCC patient using the fourhealthy control subjects as reference for 1 Mb bins analyzed from allautosomes. FIG. 26C is a table 2640 showing a distribution of thez-scores of the 1 Mb bins for the whole genome in both the pre-operativeand post-operative plasma samples of the HCC patient. FIG. 26D is atable 2660 showing the methylation levels of the tumor tissue andpre-operative plasma sample overlapped with some of the control plasmasamples when using the CHH and CHG contexts.

FIG. 27A-H show Circos plots of methylation density of 8 cancer patientsaccording to embodiments of the present invention. FIG. 27I is table2780 showing the number of sequence reads and the sequencing depthachieved per sample. FIG. 27J is a table 2790 showing a distribution ofthe z-scores of the 1 Mb bins for the whole genome in plasma of patientswith different malignancies. CL=adenocarcinoma of lung;NPC=nasopharyngeal carcinoma; CRC=colorectal carcinoma;NE=neuroendocrine carcinoma; SMS=smooth muscle sarcoma.

FIG. 28 is a flowchart of method 2800 of analyzing a biological sampleof an organism to determine a classification of a level of canceraccording to embodiments of the present invention.

FIG. 29A is a plot 2900 showing the distribution of the methylationdensities in reference subjects assuming that this distribution followsa normal distribution. FIG. 29B is a plot 2950 showing the distributionof the methylation densities in cancer subjects assuming that thisdistribution follows a normal distribution and the mean methylationlevel is 2 standard deviations below the cutoff.

FIG. 30 is a plot 3000 showing the distribution of methylation densitiesof the plasma DNA of healthy subjects and cancer patients.

FIG. 31 is a graph 3100 showing the distribution of the differences inmethylation densities between the mean of the plasma DNA of healthysubjects and the tumor tissue of the HCC patient.

FIG. 32A is a table 3200 showing the effect of reducing the sequencingdepth when the plasma sample contained 5% or 2% tumor DNA.

FIG. 32B is a graph 3250 showing the methylation densities of the repeatelements and non-repeat regions in the plasma of the four healthycontrol subjects, the buffy coat, the normal liver tissue, the tumortissue, the pre-operative plasma and the post-operative plasma samplesof the HCC patient.

FIG. 33 shows a block diagram of an example computer system 3300 usablewith system and methods according to embodiments of the presentinvention.

DEFINITIONS

A “methylome” provides a measure of an amount of DNA methylation at aplurality of sites or loci in a genome. The methylome may correspond toall of the genome, a substantial part of the genome, or relatively smallportion(s) of the genome. A “fetal methylome” corresponds to themethylome of a fetus of a pregnant female. The fetal methylome can bedetermined using a variety of fetal tissues or sources of fetal DNA,including placental tissues and cell-free fetal DNA in maternal plasma.A “tumor methylome” corresponds to the methylome of a tumor of anorganism (e.g., a human). The tumor methylome can be determined usingtumor tissue or cell-free tumor DNA in maternal plasma. The fetalmethylome and the tumor methylome are examples of a methylome ofinterest. Other examples of methylomes of interest are the methylomes oforgans that can contribute DNA into a bodily fluid (e.g. methylomes ofbrain cells, bones, the lungs, the heart, the muscles and the kidneys,etc.). The organs may be transplanted organs.

A “plasma methylome” is the methylome determined from the plasma orserum of an animal (e.g., a human). The plasma methylome is an exampleof a cell-free methylome since plasma and serum include cell-free DNA.The plasma methylome is also an example of a mixed methylome since it isa mixture of fetal/maternal methylome or tumor/patient methylome. The“placental methylome” can be determined from a chorionic villus sample(CVS) or a placental tissue sample (e.g. obtained following delivery).The “cellular methylome” corresponds to the methylome determined fromcells (e.g., blood cells) of the patient. The methylome of the bloodcells is called the blood cell methylome (or blood methylome).

A “site” corresponds to a single site, which may be a single baseposition or a group of correlated base positions, e.g., a CpG site. A“locus” may correspond to a region that includes multiple sites. A locuscan include just one site, which would make the locus equivalent to asite in that context.

The “methylation index” for each genomic site (e.g., a CpG site) refersto the proportion of sequence reads showing methylation at the site overthe total number of reads covering that site. The “methylation density”of a region is the number of reads at sites within the region showingmethylation divided by the total number of reads covering the sites inthe region. The sites may have specific characteristics, e.g., be CpGsites. Thus, the “CpG methylation density” of a region is the number ofreads showing CpG methylation divided by the total number of readscovering CpG sites in the region (e.g., a particular CpG site, CpG siteswithin a CpG island, or a larger region). For example, the methylationdensity for each 100-kb bin in the human genome can be determined fromthe total number of unconverted cytosines (which corresponds tomethylated cytosine) at CpG sites as a proportion of all CpG sitescovered by sequence reads mapped to the 100-kb region. This analysis canalso be performed for other bin sizes, e.g. 50-kb or 1-Mb, etc. A regioncould be the entire genome or a chromosome or part of a chromosome (e.g.a chromosomal arm). The methylation index of a CpG site is the same asthe methylation density for a region when the region only includes thatCpG site. The “proportion of methylated cytosines” refers the number ofcytosine sites, “C's”, that are shown to be methylated (for exampleunconverted after bisulfite conversion) over the total number ofanalyzed cytosine residues, i.e. including cytosines outside of the CpGcontext, in the region. The methylation index, methylation density andproportion of methylated cytosines are examples of “methylation levels.”

A “methylation profile” (also called methylation status) includesinformation related to DNA methylation for a region. Information relatedto DNA methylation can include, but not limited to, a methylation indexof a CpG site, a methylation density of CpG sites in a region, adistribution of CpG sites over a contiguous region, a pattern or levelof methylation for each individual CpG site within a region thatcontains more than one CpG site, and non-CpG methylation. A methylationprofile of a substantial part of the genome can be considered equivalentto the methylome. “DNA methylation” in mammalian genomes typicallyrefers to the addition of a methyl group to the 5′ carbon of cytosineresidues (i.e. 5-methylcytosines) among CpG dinucleotides. DNAmethylation may occur in cytosines in other contexts, for example CHGand CHH, where H is adenine, cytosine or thymine. Cytosine methylationmay also be in the form of 5-hydroxymethylcytosine. Non-cytosinemethylation, such as N6-methyladenine, has also been reported.

A “tissue” corresponds to any cells. Different types of tissue maycorrespond to different types of cells (e.g., liver, lung, or blood),but also may correspond to tissue from different organisms (mother vs.fetus) or to healthy cells vs. tumor cells. A “biological sample” refersto any sample that is taken from a subject (e.g., a human, such as apregnant woman, a person with cancer, or a person suspected of havingcancer, an organ transplant recipient or a subject suspected of having adisease process involving an organ (e.g., the heart in myocardialinfarction, or the brain in stroke) and contains one or more nucleicacid molecule(s) of interest. The biological sample can be a bodilyfluid, such as blood, plasma, serum, urine, vaginal fluid, uterine orvaginal flushing fluids, plural fluid, ascitic fluid, cerebrospinalfluid, saliva, sweat, tears, sputum, bronchoalveolar lavage fluid, etc.Stool samples can also be used.

The term “level of cancer” can refer to whether cancer exists, a stageof a cancer, a size of tumor, whether there is metastasis, the totaltumor burden of the body, and/or other measure of a severity of acancer. The level of cancer could be a number or other characters. Thelevel could be zero. The level of cancer also includes premalignant orprecancerous conditions (states) associated with mutations or a numberof mutations. The level of cancer can be used in various ways. Forexample, screening can check if cancer is present in someone who is notknown previously to have cancer. Assessment can investigate someone whohas been diagnosed with cancer to monitor the progress of cancer, studythe effectiveness of therapies or to determine the prognosis. Detectioncan mean ‘screening’ or can mean checking if someone, with suggestivefeatures of cancer (e.g. symptoms or other positive tests), has cancer.

DETAILED DESCRIPTION

Epigenetic mechanisms play an important role in embryonic and fetaldevelopment. However, human embryonic and fetal tissues (includingplacental tissues) are not readily accessible (U.S. Pat. No. 6,927,028).Certain embodiments have addressed this problem by analyzing a samplethat has cell-free fetal DNA molecules present in maternal circulation.The fetal methylome can be deduced in a variety of ways. For example,the maternal plasma methylome can be compared to a cellular methylome(from blood cells of the mother) and the difference is shown to becorrelated to the fetal methylome. As another example, fetal-specificalleles can be used to determine the methylation of the fetal methylomeat specific loci. Additionally, the size of a fragment can be used as anindicator of a methylation percentage, as a correlation between size andmethylation percentage is shown.

In one embodiment, genome-wide bisulfite sequencing is used to analyzethe methylation profile (part or all of a methylome) of maternal plasmaDNA at single nucleotide resolution. By exploiting the polymorphicdifferences between the mother and the fetus, the fetal methylome couldbe assembled from maternal blood samples. In another implementation,polymorphic differences were not used, but a differential between theplasma methylome and the blood cell methylome can be used.

In another embodiment, by exploiting single nucleotide variations and/orcopy number aberrations between a tumor genome and a nontumor genome,and sequencing data from plasma (or other sample), methylation profilingof a tumor can be performed in the sample of a patient suspected orknown to have cancer. A difference in a methylation level in a plasmasample of a test individual when compared with the plasma methylationlevel of a healthy control or a group of healthy controls can allow theidentification of the test individual as harboring cancer. Additionally,the methylation profile can act as a signature that reveals the type ofcancer, for example, from which organ, that the person has developed andwhether metastasis has occurred.

Due to the noninvasive nature of this approach, we were able to seriallyassess the fetal and maternal plasma methylomes from maternal bloodsamples collected in the first trimester, third trimester and afterdelivery. Gestation-related changes were observed. The approach can alsobe applied to samples obtained during the second trimester. The fetalmethylome deduced from maternal plasma during pregnancy resembled theplacental methylome. Imprinted genes and differentially methylatedregions were identified from the maternal plasma data.

We have therefore developed an approach to study the fetal methylomenoninvasively, serially and comprehensively, thus offering thepossibility for identifying biomarkers or direct testing ofpregnancy-related pathologies. Embodiments can also be used to study thetumor methylome noninvasively, serially and comprehensively, forscreening or detecting if a subject is suffering from cancer, formonitoring malignant diseases in a cancer patient and forprognostication. Embodiments can be applied to any cancer type,including, but not limited to, lung cancer, breast cancer, colorectalcancer, prostate cancer, nasopharyngeal cancer, gastric cancer,testicular cancer, skin cancer, cancer affecting the nervous system,bone cancer, ovarian cancer, liver cancer, hematologic malignancies,pancreatic cancer, endometriocarcinoma, kidney cancer etc.

A description of how to determine a methylome or methylation profile isfirst discussed, and then different methylomes are described (such asfetal methylomes, a tumor methylome, methylomes of the mother or apatient, and a mixed methylome, e.g., from plasma). The determination ofa fetal methylation profile is then described using fetal-specificmarkers or by comparing a mixed methylation profile to a cellularmethylation profile. Fetal methylation markers are determined bycomparing methylation profiles. A relationship between size andmethylation is discussed. Uses of methylation profiles to detect cancerare also provided.

I. Determination of a Methylome

A myriad of approaches have been used to investigate the placentalmethylome, but each approach has its limitations. For example, sodiumbisulfite, a chemical that modifies unmethylated cytosine residues touracil and leaves methylated cytosine unchanged, converts thedifferences in cytosine methylation into a genetic sequence differencefor further interrogation. The gold standard method of studying cytosinemethylation is based on treating tissue DNA with sodium bisulfitefollowed by direct sequencing of individual clones ofbisulfite-converted DNA molecules. After the analysis of multiple clonesof DNA molecules, the cytosine methylation pattern and quantitativeprofile per CpG site can be obtained. However, cloned bisulfitesequencing is a low throughput and labor-intensive procedure that cannotbe readily applied on a genome-wide scale.

Methylation-sensitive restriction enzymes that typically digestunmethylated DNA provide a low cost approach to study DNA methylation.However, data generated from such studies are limited to loci with theenzyme recognition motifs and the results are not quantitative.Immunoprecipitation of DNA bound by anti-methylated cytosine antibodiescan be used to survey large segments of the genome but tends to biastowards loci with dense methylation due to higher strength of antibodybinding to such regions. Microarray-based approaches are dependent onthe a priori design of the interrogation probes and hybridizationefficiencies between the probes and the target DNA.

To interrogate a methylome comprehensively, some embodiments usemassively parallel sequencing (MPS) to provide genome-wide informationand quantitative assessment of the level of methylation on a pernucleotide and per allele basis. Recently, bisulfite conversion followedby genome-wide MPS has become feasible (R. Lister et al 2008 Cell; 133:523-536).

Among the small number of published studies (R. Lister et al. 2009Nature; 462: 315-322); L. Laurent et al. 2010 Genome Res; 20: 320-331;Y. Li et al. 2010 PLoS Biol; 8: e1000533; and M. Kulis et al. 2012 NatGenet; 44: 1236-1242) that applied genome-wide bisulfite sequencing forthe investigation of human methylomes, two studies focused on embryonicstem cells and fetal fibroblasts (R. Lister et al. 2009 and L. Laurentet al 2010). Both studies analyzed cell-line derived DNA.

A. Genome-Wide Bisulfite Sequencing

Certain embodiments can overcome the aforesaid challenges and enableinterrogation of a fetal methylome comprehensively, noninvasively andserially. In one embodiment, genome-wide bisulfite sequencing was usedto analyze cell-free fetal DNA molecules that are found in thecirculation of pregnant women. Despite the low abundance and fragmentednature of plasma DNA molecules, we were able to assemble a highresolution fetal methylome from maternal plasma and serially observe thechanges with pregnancy progression. Given the intense interest innoninvasive prenatal testing (NIPT), embodiments can provide a powerfulnew tool for fetal biomarker discovery or serve as a direct platform forachieving NIPT of fetal or pregnancy-associated diseases. Data from thegenome-wide bisulfite sequencing of various samples, from which thefetal methylome can be derived, is now provided. In one embodiment, thistechnology can be applied for methylation profiling in pregnanciescomplicated with preeclampsia, or intrauterine growth retardation, orpreterm labor. For such complicated pregnancies, this technology can beused serially because of its noninvasive nature, to allow for themonitoring and/or prognostication and/or response to treatment.

FIG. 1A shows a table 100 of sequencing results for maternal blood,placenta, and maternal plasma according to embodiments of the presentinvention. In one embodiment, whole genome sequencing was performed onbisulfite-converted DNA libraries, prepared using methylated DNA libraryadaptors (Illumina) (R. Lister et al. 2008), of blood cells of the bloodsample collected in the first trimester, the CVS, the placental tissuecollected at term, the maternal plasma samples collected during thefirst and third trimesters and the postpartum period. Blood cell andplasma DNA samples obtained from one adult male and one adultnon-pregnant female were also analyzed. A total of 9.5 billion pairs ofraw sequence reads were generated in this study. The sequencing coverageof each sample is shown in table 100.

The sequence reads that were uniquely mappable to the human referencegenome reached average haploid genomic coverages of 50 folds, 34 foldsand 28 folds, respectively, for the first trimester, third trimester andpost-delivery maternal plasma samples. The coverage of the CpG sites inthe genome ranged from 81% to 92% for the samples obtained from thepregnancy. The sequence reads that spanned CpG sites amounted to averagehaploid coverages of 33 folds per strand, 23 folds per strand and 19folds per strand, respectively, for the first trimester, third trimesterand post-delivery maternal plasma samples. The bisulfite conversionefficiencies for all samples were >99.9% (table 100).

In table 100, ambiguous rate (marked “a”) refers to the proportion ofreads mapped onto both the Watson and Crick strands of the referencehuman genome. Lambda conversion rate refers to the proportion ofunmethylated cytosines in the internal lambda DNA control beingconverted to the “thymine” residues by bisulfite modification. Hgenerically equates to A, C, or T. “a” refers to reads that could bemapped to a specific genomic locus but cannot be assigned to the Watsonor Crick strand. “b” refers to paired reads with identical start and endcoordinates. For “c”, lambda DNA was spiked into each sample beforebisulfite conversion. The lambda conversion rate refers to theproportion of cytosine nucleotides that remain as cytosine afterbisulfite conversion and is used as an indication of the rate ofsuccessful bisulfite conversion. “d” refers to the number of cytosinenucleotides present in the reference human genome and remaining as acytosine sequence after bisulfite conversion.

During bisulfite modification, unmethylated cytosines are converted touracils and subsequently thymines after PCR amplifications while themethylated cytosines would remain intact (Frommer M, et al. 1992 ProcNatl Acad Sci USA; 89:1827-31). After sequencing and alignment, themethylation status of an individual CpG site could thus be inferred fromthe count of methylated sequence reads “M” (methylated) and the count ofunmethylated sequence reads “U” (unmethylated) at the cytosine residuein CpG context. Using the bisulfite sequencing data, the entiremethylomes of maternal blood, placenta and maternal plasma wereconstructed. The mean methylated CpG density (also called methylationdensity m) of specific loci in the maternal plasma can be calculatedusing the equation:

$m = \frac{M}{M + U}$

where M is the count of methylated reads and U is the count ofunmethylated reads at the CpG sites within the genetic locus. If thereis more than one CpG site within a locus, then M and U correspond to thecounts across the sites.

B. Various Techniques

As described above, methylation profiling can be performed usingmassively parallel sequencing (MPS) of bisulfite converted plasma DNA.The MPS of the bisulfite converted plasma DNA can be performed in arandom or shotgun fashion. The depth of the sequencing can be variedaccording to the size of the region of interest.

In another embodiment, the region(s) of interest in the bisulfiteconverted plasma DNA can be first captured using a solution-phase orsolid-phase hybridization-based process, followed by the MPS. Themassively parallel sequencing can be performed using asequencing-by-synthesis platform such as the Illumina, asequencing-by-ligation platform such as the SOLiD platform from LifeTechnologies, a semiconductor-based sequencing system such as the IonTorrent or Ion Proton platforms from Life Technologies, or singlemolecule sequencing system such as the Helicos system or the PacificBiosciences system or a nanopore-based sequencing system. Nanopore-basedsequencing including nanopores that are constructed using lipid bilayersand protein nanopore, and solid-state nanopores (such as those that aregraphene based). As selected single molecule sequencing platforms wouldallow the methylation status of DNA molecules (includingN6-methyladenine, 5-methylcytosine and 5-hydroxymethylcytosine) to beelucidated directly without bisulfite conversion (B. A. Flusberg et al.2010 Nat Methods; 7: 461-465; J. Shim et al. 2013 Sci Rep; 3:1389. doi:10.1038/srep01389), the use of such platforms would allow themethylation status of non-bisulfite converted sample DNA (e.g. plasmaDNA) to be analyzed.

Besides sequencing, other techniques can be used. In one embodiment,methylation profiling can be done by methylation-specific PCR ormethylation-sensitive restriction enzyme digestion followed by PCR orligase chain reaction followed by PCR. In yet other embodiments, the PCRis a form of single molecule or digital PCR (B. Vogelstein et al. 1999Proc Natl Acad Sci USA; 96: 9236-9241). In yet further embodiments, thePCR can be a real-time PCR. In other embodiments, the PCR can bemultiplex PCR.

II. Analysis of Methylomes

Some embodiments can determine the methylation profile of plasma DNAusing whole genome bisulfite sequencing. The methylation profile of afetus can be determined by sequencing maternal plasma DNA samples, as isdescribed below. Thus, the fetal DNA molecules (and fetal methylome)were accessed noninvasively during the pregnancy, and changes weremonitored serially as the pregnancy progressed. Due to thecomprehensiveness of the sequencing data, we were able to study thematernal plasma methylomes on a genome-wide scale at single nucleotideresolution.

Since the genomic coordinates of the sequenced reads were known, thesedata enabled one to study the overall methylation levels of themethylome or any region of interest in the genome and to make comparisonbetween different genetic elements. In addition, multiple sequence readscovered each CpG site or locus. A description of some of the metricsused to measure the methylome are now provided.

A. Methylation of Plasma DNA Molecules

DNA molecules are present in human plasma at low concentrations and in afragmented form, typically in lengths resembling mononucleosomal units(Y. M. D. Lo et al. 2010 Sci Transl Med; 2: 61ra91; and Zheng at al.2012 Clin Chem; 58: 549-558). Despite these limitations, a genome-widebisulfite-sequencing pipeline was able to analyze the methylation of theplasma DNA molecules. In other embodiments, as selected single moleculesequencing platforms would allow the methylation status of DNA moleculesto be elucidated directly without bisulfite conversion (Flusberg B A etal. 2010 Nat Methods; 7: 461-465; Shim J et al. 2013 Sci Rep; 3:1389.doi: 10.1038/srep01389), the use of such platforms would allow thenon-bisulfite converted plasma DNA to be used to determine themethylation levels of plasma DNA or to determine the plasma methylome.Such platforms can detect N6-methyladenine, 5-methylcytosine and5-hydroxymethylcytosine.

FIG. 1B shows methylation density in 1-Mb windows of sequenced samplesaccording to embodiments of the present invention. Plot 150 is a Circosplot depicting the methylation density in the maternal plasma andgenomic DNA in 1-Mb windows across the genome. From outside to inside:chromosome ideograms can be oriented pter-qter in a clockwise direction(centromeres are shown in red), maternal blood (red), placenta (yellow),maternal plasma (green), shared reads in maternal plasma (blue), andfetal-specific reads in maternal plasma (purple). The overall CpGmethylation levels (i.e., density levels) of maternal blood cells,placenta and maternal plasma can be found in table 100. The methylationlevel of maternal blood cells is in general higher than that of theplacenta across the whole genome.

B. Comparison of Bisulfite Sequencing to Other Techniques

We studied the placental methylome using massively parallel bisulfitesequencing. In addition, we studied the placental methylome using anoligonucleotide array platform that covered about 480,000 CpG sites inthe human genome (Illumina) (M. Kulis et al. 2012 Nat Genet; 44:1236-1242; and C. Clark et al. 2012 PLoS One; 7: e50233). In oneembodiment using beadchip-based genotyping and methylation analysis,genotyping was performed using the Illumina HumanOmni2.5-8 genotypingarray according to the manufacturer's protocol. Genotypes were calledusing the GenCall algorithm of the Genome Studio Software (Illumina).The call rates were over 99%. For the microarray based methylationanalysis, genomic DNA (500-800 ng) was treated with sodium bisulfiteusing the Zymo E Z DNA Methylation Kit (Zymo Research, Orange, Calif.,USA) according to the manufacturer's recommendations for the IlluminaInfinium Methylation Assay.

The methylation assay was performed on 4 μl bisulfite-converted genomicDNA at 50 ng/μl according to the Infinium HD Methylation Assay protocol.The hybridized beadchip was scanned on an Illumina iScan instrument. DNAmethylation data were analyzed by the GenomeStudio (v2011.1) MethylationModule (v1.9.0) software, with normalization to internal controls andbackground subtraction. The methylation index for individual CpG sitewas represented by a beta value (β), which was calculated using theratio of fluorescent intensities between methylated and unmethylatedalleles:

$\beta = \frac{{Intensity}{of}{methylated}{allele}}{\begin{matrix}{{{Intensity}{of}{unmethylated}{allele}} +} \\{{{Intensity}{of}{methylated}{allele}} + 100}\end{matrix}}$

For CpG sites that were represented on the array and sequenced tocoverage of at least 10 folds, we compared the beta-value obtained bythe array to the methylation index as determined by sequencing of thesame site. Beta-values represented the intensity of methylated probes asa proportion of the combined intensity of the methylated andunmethylated probes covering the same CpG site. The methylation indexfor each CpG site refers to the proportion of methylated reads over thetotal number of reads covering that CpG.

FIGS. 2A-2C show plots of the beta-values determined by the IlluminaInfinium HumanMethylation 450K beadchip array against the methylationindices determined by genome-wide bisulfite sequencing of correspondingCpG sites that were interrogated by both platforms: (A) Maternal bloodcells, (B) Chorionic villus sample, (C) Term placental tissue. The datafrom both platforms were highly concordant and the Pearson correlationcoefficients were 0.972, 0.939 and 0.954, and R² values were 0.945,0.882 and 0.910 for the maternal blood cells, CVS and term placentaltissue, respectively.

We further compared our sequencing data with those reported by Chu etal, who investigated the methylation profiles of 12 pairs of CVS andmaternal blood cell DNA samples using an oligonucleotide array thatcovered about 27,000 CpG sites (T. Chu et al. 2011 PLoS One; 6: e14723).The correlation data between the sequencing results of the CVS andmaternal blood cell DNA and each of the 12 pairs of samples in theprevious study are an average Pearson coefficient (0.967) and R² (0.935)for maternal blood and an average Pearson coefficient (0.943) and R²(0.888) for the CVS. Among the CpG sites represented on both arrays, ourdata correlated highly with the published data. The rates of non-CpGmethylation were <1% for the maternal blood cells, CVS and placentaltissues (table 100). These results were consistent with current beliefthat substantial amounts of non-CpG methylation were mainly restrictedto pluripotent cells (R. Lister et al. 2009 and L. Laurent et al 2010).

C. Comparison of Plasma and Blood Methylomes for Non-Pregnant Subjects

FIGS. 3A and 3B show bar charts of percentage of methylated CpG sites inplasma and blood cells collected from an adult male and a non-pregnantadult female: (A) Autosomes, (B) Chromosome X. The charts show asimilarity between plasma and blood methylomes of a male and anon-pregnant female. The overall proportions of CpG sites that weremethylated in the male and non-pregnant female plasma samples werealmost the same as the corresponding blood cell DNA (table 100 and FIGS.2A and 2B).

We next studied the correlation of the methylation profiles of theplasma and blood cell samples in a locus-specific manner. We determinedthe methylation density of each 100-kb bin in the human genome bydetermining the total number of unconverted cytosines at CpG sites as aproportion of all CpG sites covered by sequence reads mapped to the100-kb region. The methylation densities were highly concordant betweenthe plasma sample and corresponding blood cell DNA of the male as wellas the female samples.

FIGS. 4A and 4B show plots of methylation densities of correspondingloci in blood cell DNA and plasma DNA: (A) Non-pregnant adult female,(B) Adult male. The Pearson correlation coefficient and R² value for thenon-pregnant female samples were respectively 0.963 and 0.927, and thatfor the male samples were respectively 0.953 and 0.908. These data areconsistent with previous findings based on the assessment of genotypesof plasma DNA molecules of recipients of allogenic hematopoietic stemcell transplantation which showed that hematopoietic cells are thepredominant source of DNA in human plasma (Zheng at al., 2012).

D. Methylation Levels Across Methylomes

We next studied the DNA methylation levels of maternal plasma DNA,maternal blood cells, and placental tissue to determine methylationlevels. The levels were determined for repeat regions, non-repeatregions, and overall.

FIGS. 5A and 5B show bar charts of percentage of methylated CpG sitesamong samples collected from the pregnancy: (A). Autosomes, (B)Chromosome X. The overall proportions of methylated CpGs were 67.0% and68.2% for the first and third trimester maternal plasma samples,respectively. Unlike the results obtained from the non-pregnantindividuals, these proportions were lower than that of the firsttrimester maternal blood cell sample but higher than that of the CVS andterm placental tissue samples (table 100). Of note, the percentage ofmethylated CpGs for the post-delivery maternal plasma sample was 73.1%which was similar to the blood cell data (table 100). These trends wereobserved in CpGs distributed over all autosomes as well as chromosome Xand spanned across both the non-repeat regions and multiple classes ofrepeat elements of the human genome.

Both the repeat and non-repeat elements in the placenta were found to behypomethylated relative to maternal blood cells. The results wereconcordant to the findings in literatures that the placenta ishypomethylated to other tissues, including peripheral blood cells.

Between 71% to 72% of the sequenced CpG sites were methylated in theblood cell DNA from the pregnant woman, non-pregnant woman and adultmale (table 100 of FIG. 1). These data are comparable with the report of68.4% of CpG sites of blood mononuclear cells reported by Li et al 2010.Consistent with the previous reports on the hypomethylated nature ofplacental tissues, 55% and 59% of the CpG sites were methylated in theCVS and term placental tissue, respectively (table 100).

FIG. 6 shows a bar chart of methylation level of different repeatclasses of the human genome for maternal blood, placenta and maternalplasma. The repeat classes are as defined by the UCSC genome browser.Data shown are from the first trimester samples. Unlike earlier datasuggesting that the hypomethylated nature of placental tissues wasmainly observed in certain repeat classes in the genome (B. Novakovic etal. 2012 Placenta; 33: 959-970), here we show that the placenta was infact hypomethylated in most classes of genomic elements with referenceto blood cells.

E. Similarity of Methylomes

Embodiments can determine the methylomes of placental tissues, bloodcells and plasma using the same platform. Hence, direct comparisons ofthe methylomes of those biological sample types were possible. The highlevel of resemblance between methylomes of the blood cells and plasmafor the male and non-pregnant female as well as between the maternalblood cells and the post-delivery maternal plasma sample furtheraffirmed that hematopoietic cells were the main sources of DNA in humanplasma (Zheng at al., 2012).

The resemblances are evident both in terms of the overall proportion ofmethylated CpGs in the genome as well as from the high correlation ofmethylation densities between corresponding loci in the blood cell DNAand plasma DNA. Yet, the overall proportions of methylated CpGs in thefirst trimester and third trimester maternal plasma samples were reducedwhen compared with the maternal blood cell data or the post-deliverymaternal plasma sample. The reduced methylation levels during pregnancywere due to the hypomethylated nature of the fetal DNA molecules presentin maternal plasma.

The reversal of the methylation profile in the post-delivery maternalplasma sample to become more similar to that of the maternal blood cellssuggests that the fetal DNA molecules had been removed from the maternalcirculation. Calculation of the fetal DNA concentrations based on SNPmarkers of the fetus indeed showed that the concentration changed from33.9% before delivery to just 4.5% in the post-delivery sample.

F. Other Applications

Embodiments have successfully assembled DNA methylomes through the MPSanalysis of plasma DNA. The ability to determine the placental or fetalmethylome from maternal plasma provides a noninvasive method todetermine, detect and monitor the aberrant methylation profilesassociated with pregnancy-associated conditions such as preeclampsia,intrauterine growth restriction, preterm labor and others. For example,the detection of a disease-specific aberrant methylation signatureallows the screening, diagnosis and monitoring of suchpregnancy-associated conditions. The measuring of the maternal plasmamethylation level allows the screening, diagnosis and monitoring of suchpregnancy-associated conditions. Besides the direct applications on theinvestigation of pregnancy-associated conditions, the approach could beapplied to other areas of medicine where plasma DNA analysis is ofinterest. For example, the methylomes of cancers could be determinedfrom plasma DNA of cancer patients. Cancer methylomic analysis fromplasma, as described herein, is potentially a synergistic technology tocancer genomic analysis from plasma (K. C. A. Chan at al. 2013 ClinChem; 59:211-224 and Leary R J et al. 2012 Sci Transl Med; 4:162ra154).

For example, the determination of a methylation level of a plasma samplecould be used to screen for cancer. When the methylation level of theplasma sample shows aberrant levels compared with healthy controls,cancer may be suspected. Then further confirmation and assessment of thetype of cancer or tissue origin of the cancer may be performed bydetermining the plasma profile of methylation at different genomic locior by plasma genomic analysis to detect tumor-associated copy numberaberrations, chromosomal translocations and single nucleotide variants.Alternatively, radiological and imaging investigations (e.g. computedtomography, magnetic resonance imaging, positron emission tomography) orendoscopy (e.g. upper gastrointestinal endoscopy or colonoscopy) couldbe used to further investigate individuals who were suspected of havingcancer based on the plasma methylation level analysis.

For cancer screening or detection, the determination of a methylationlevel of a plasma (or other biologic) sample can be used in conjunctionwith other modalities for cancer screening or detection such as prostatespecific antigen measurement (e.g. for prostate cancer),carcinoembryonic antigen (e.g. for colorectal carcinoma, gastriccarcinoma, pancreatic carcinoma, lung carcinoma, breast carcinoma,medullary thyroid carcinoma), alpha fetoprotein (e.g. for liver canceror germ cell tumors) and CA19-9 (e.g. for pancreatic carcinoma).

Additionally, other tissues may be sequenced to obtain a cellularmethylome. For example, liver tissue can be analyzed to determine amethylation pattern specific to the liver, which may be used to identifyliver pathologies. Other tissues which can also be analyzed includebrain cells, bones, the lungs, the heart, the muscles and the kidneys,etc. The methylation profiles of various tissues may change from time totime, e.g. as a result of development, aging, disease processes (e.g.inflammation or cirrhosis) or treatment (e.g. treatment withdemethylating agents such as 5-azacytidine and 5-azadeoxycytidine). Thedynamic nature of DNA methylation makes such analysis potentially veryvaluable for monitoring of physiological and pathological processes. Forexample, if one detects a change in the plasma methylome of anindividual compared to a baseline value obtained when they were healthy,one could then detect disease processes in organs that contribute plasmaDNA.

Also, the methylomes of transplanted organs could be determined fromplasma DNA of organ transplantation recipients. Transplant methylomicanalysis from plasma, as described in this invention, is potentially asynergistic technology to transplant genomic analysis from plasma (Y. W.Zheng at al, 2012; Y. M. D. Lo at al. 1998 Lancet; 351: 1329-1330; andT. M. Snyder et al. 2011 Proc Natl Acad Sci USA; 108: 6229-6234).

III. Determining Fetal Methylome Using SNPs

As described above, the plasma methylome corresponds to the bloodmethylome for a non-pregnant normal person. However, for a pregnantfemale, the methylomes differ. Fetal DNA molecules circulate in maternalplasma among a majority background of maternal DNA (Y. M. D. Lo et al.1998 Am J Hum Genet; 62: 768-775).

Thus, for a pregnant female, the plasma methylome is largely a compositeof the placental methylome and the blood methylome. Accordingly, one canextract the placental methylome from plasma.

In one embodiment, single nucleotide polymorphism (SNP) differencesbetween the mother and the fetus are used to identify the fetal DNAmolecules in maternal plasma. An aim was to identify SNP loci where themother was homozygous, but the fetus is heterozygous; the fetal-specificallele can be used to determine which DNA fragments are from the fetus.Genomic DNA from the maternal blood cells was analyzed using a SNPgenotyping array, the Illumina HumanOmni2.5-8.

A. Correlation of Methylation of Fetal-Specific Reads and PlacentalMethylome

Loci having two different alleles, where the amount of one allele (B)was significantly less than the other allele (A), were identified fromsequencing results of a biological sample. Reads covering the B alleleswere regarded as fetal-specific (fetal-specific reads). The mother isdetermined to be homozygous for A and the fetus heterozygous for A/B,and thus reads covering the A allele were shared by the mother and fetus(shared reads).

The mother was found to be homozygous at 1,945,516 loci on theautosomes. The maternal plasma DNA sequencing reads that covered theseSNPs were inspected. Reads carrying a non-maternal allele was detectedat 107,750 loci and these were considered the informative loci. At eachinformative SNP, the allele that was not from the mother was termed afetal-specific allele while the other one was termed a shared allele.

A fractional fetal/tumor DNA concentration (also called fetal DNApercentage) in the maternal plasma can be determined. In one embodiment,the fractional fetal DNA concentration in the maternal plasma, f, isdetermined by the equation:

$f = \frac{2p}{p + q}$

where p is the number of sequenced reads with the fetal-specific alleleand q is the number of sequenced reads with the shared allele betweenthe mother and the fetus (Y. M. D. Lo et al. 2010 Sci Transl Med;2:61ra91). The fetal DNA proportions in the first trimester, thirdtrimester and post-delivery maternal plasma samples were found to be14.4%, 33.9% and 4.5%, respectively. The fetal DNA proportions were alsocalculated using the numbers of reads that aligned to chromosome Y.Based on the chromosome Y data, the results were 14.2%, 34.9% and 3.7%,respectively, in the first trimester, third trimester and post-deliverymaternal plasma samples.

By separately analyzing the fetal-specific or shared sequence reads,embodiments demonstrate that the circulating fetal DNA molecules weremuch more hypomethylated than the background DNA molecules. Comparisonsof the methylation densities of corresponding loci in the fetal-specificmaternal plasma reads and the placental tissue data for both the firstand third trimesters revealed high levels of correlation. These dataprovided genome level evidence that the placenta is the predominantsource of fetal-derived DNA molecules in maternal plasma and representeda major step forward compared with previous evidence based oninformation derived from selected loci.

We determined the methylation density of each 1-Mb region in the genomeusing either the fetal-specific or shared reads that covered CpG sitesadjacent to the informative SNPs. The fetal and non-fetal-specificmethylomes assembled from the maternal plasma sequence reads can bedisplayed, for example, in Circos plots (M. Krzywinski et al. 2009Genome Res; 19: 1639-1645). The methylation densities per 1-Mb bin werealso determined for the maternal blood cells and placental tissuesamples.

FIG. 7A shows a Circos plot 700 for first trimester samples. FIG. 7Bshows a Circos plot 750 for third trimester samples. The plots 700 and750 show methylation density per 1-Mb bin. Chromosome ideograms(outermost ring) are oriented pter-qter in a clockwise direction(centromeres are shown in red). The second outermost track shows thenumber of CpG sites in the corresponding 1-Mb regions up to 20,000sites. The methylation densities of the corresponding 1-Mb regions areshown in the other tracks based on the color scheme shown in the center.

For the first trimester samples (FIG. 7A), from inside to outside, thetracks are: chorionic villus sample, fetal-specific reads in maternalplasma, maternal-specific reads in maternal plasma, combined fetal andnon-fetal reads in maternal plasma, and maternal blood cells. For thethird trimester samples (FIG. 7B), the tracks are: term placentaltissue, fetal-specific reads in maternal plasma, maternal-specific readsin maternal plasma, combined fetal and non-fetal reads in maternalplasma, post-delivery maternal plasma and maternal blood cells (from thefirst trimester blood sample). It can be appreciated that for both thefirst and third trimester plasma samples, the fetal methylomes were morehypomethylated than those of the non-fetal-specific methylomes.

The overall methylation profile of the fetal methylomes more closelyresembled that of the CVS or placental tissue samples. On the contrary,the DNA methylation profile of the shared reads in plasma, which werepredominantly maternal DNA, more closely resembled that of the maternalblood cells. We then performed a systematic locus-by-locus comparison ofthe methylation densities of the maternal plasma DNA reads and thematernal or fetal tissues. We determined the methylation densities ofCpG sites that were present on the same sequence read as the informativeSNPs and were covered by at least 5 maternal plasma DNA sequence reads.

FIGS. 8A-8D shows plots of comparisons of the methylation densities ofgenomic tissue DNA against maternal plasma DNA for CpG sites surroundingthe informative single nucleotide polymorphisms. FIG. 8A showsmethylation densities for fetal-specific reads in the first trimestermaternal plasma sample relative to methylation densities for reads in aCVS sample. As can be seen, the fetal-specific values correspond well tothe CVS values.

FIG. 8B shows methylation densities for fetal-specific reads in thethird trimester maternal plasma sample relative to methylation densitiesfor reads in a term placental tissue. Again, the sets of densitiescorrespond well, indicating the a fetal methylation profile can beobtained by analyzing reads with fetal-specific alleles.

FIG. 8C shows methylation densities for shared reads in the firsttrimester maternal plasma sample relative to methylation densities forreads in maternal blood cells. Given that most of the shared reads arefrom the mother, the two sets of values correspond well. FIG. 8D showsmethylation densities for shared reads in the third trimester maternalplasma sample relative to methylation densities for reads in maternalblood cells.

For the fetal-specific reads in maternal plasma, the Spearmancorrelation coefficient between the first trimester maternal plasma andthe CVS was 0.705 (P<2.2*e−16); and that between the third trimestermaternal plasma and term placental tissue was 0.796 (P<2.2*e−16) (FIGS.8A and 8B). A similar comparison was performed for the shared reads inmaternal plasma with the maternal blood cell data. The Pearsoncorrelation coefficient was 0.653 (P<2.2*e−16) for the first trimesterplasma sample and was 0.638 (P<2.2*e−16) for the third trimester plasmasample (FIGS. 8C and 8D).

B. Fetal Methylome

In one embodiment, to assemble the fetal methylome from maternal plasma,we sorted for sequence reads that spanned at least one informative fetalSNP site and contained at least one CpG site within the same read. Readsthat showed the fetal-specific alleles were included in the assembly ofthe fetal methylome. Reads that showed the shared allele, i.e.non-fetal-specific allele, were included in the assembly of thenon-fetal-specific methylome which was predominantly comprised ofmaternal-derived DNA molecules.

The fetal-specific reads covered 218,010 CpG sites on the autosomes forthe first trimester, maternal plasma samples. The corresponding figuresfor the third trimester and post-delivery maternal plasma samples were263,611 and 74,020, respectively. On average, the shared reads coveredthose CpG sites an average of 33.3, 21.7 and 26.3 times, respectively.The fetal-specific reads covered those CpG sites 3.0, 4.4 and 1.8 times,respectively, for the first trimester, third trimester and post-deliverymaternal plasma samples.

Fetal DNA represents a minor population in maternal plasma and thereforethe coverage of those CpG sites by fetal-specific reads was proportionalto the fetal DNA percentage of the sample. For the first trimestermaternal plasma sample, the overall percentage of methylated CpG amongthe fetal reads was 47.0%, while that for the shared reads was 68.1%.For the third trimester maternal plasma sample, the percentage ofmethylated CpG of the fetal reads was 53.3%, while that for the sharedreads was 68.8%. These data showed that the fetal-specific reads inmaternal plasma were more hypomethylated than the shared reads inmaternal plasma

C. Method

The techniques described above can also be used to determine a tumormethylation profile. Methods for determining fetal and tumor methylationprofiles are now described.

FIG. 9 is a flowchart illustrating a method 900 for determining a firstmethylation profile from a biological sample of an organism according toembodiments of the present invention. Method 900 can construct anepigenetic map of the fetus from the methylation profile of maternalplasma. The biological sample includes cell-free DNA comprising amixture of cell-free DNA originating from a first tissue and from asecond tissue. As examples, the first tissue can be from a fetus, atumor, or a transplanted organ.

At block 910, a plurality of DNA molecules are analyzed from thebiological sample. The analysis of a DNA molecule can includedetermining a location of the DNA molecule in a genome of the organism,determining a genotype of the DNA molecule, and determining whether theDNA molecule is methylated at one or more sites.

In one embodiment, the DNA molecules are analyzed using sequence readsof the DNA molecules, where the sequencing is methylation aware. Thus,the sequence reads include methylation status of DNA molecules from thebiological sample. The sequence reads can be obtained from varioussequencing techniques, PCR-techniques, arrays, and other suitabletechniques for identifying sequences of fragments. The methylationstatus of sites of the sequence read can be obtained as describedherein.

At block 920, a plurality of first loci are identified at which a firstgenome of the first tissue is heterozygous for a respective first alleleand a respective second allele and a second genome of the second tissueis homozygous for the respective first allele. For example,fetal-specific reads may be identified at the plurality of first loci.Or, tumor-specific reads may be identified at the plurality of firstloci. The tissue-specific reads can be identified from sequencing readswhere the percentage of sequence reads of the second allele fall withina particular range, e.g., about 3%-25%, thereby indicating a minoritypopulation of DNA fragment from a heterozygous genome at the locus and amajority population from a homozygous genome at the locus.

At block 930, DNA molecules located at one or more sites of each of thefirst locus are analyzed. A number of DNA molecules that are methylatedat a site and correspond to the respective second allele of the locusare determined. There may be more than one site per locus. For example,a SNP might indicate that a fragment is fetal-specific, and thatfragment may have multiple sites whose methylation status is determined.The number of reads at each site that are methylated can be determined,and the total number of methylated reads for the locus can bedetermined.

The locus may be defined by a specific number of sites, a specific setof sites, or a particular size for a region around a variation thatcomprises the tissue-specific allele. A locus can have just one site.The sites can have specific properties, e.g., be CpG sites. Thedetermination of a number of reads that are unmethylated is equivalent,and is encompassed within the determination of the methylation status.

At block 940, for each of the first loci, a methylation density iscalculated based on the numbers of DNA molecules methylated at the oneor more sites of the locus and corresponding to the respective secondallele of the locus. For example, a methylation density can bedetermined for CpG sites corresponding to a locus.

At block 950, the first methylation profile of the first tissue iscreated from the methylation densities for the first loci. The firstmethylation profile can correspond to particular sites, e.g., CpG sites.The methylation profile can be for all loci having a fetal-specificallele, or just some of those loci.

IV. Using Difference of Plasma and Blood Methylomes

Above, it was shown that the fetal-specific reads from plasma correlateto the placental methylome. As the maternal component of the maternalplasma methylome is primarily contributed by the blood cells, thedifference between the plasma methylome and blood methylome can be usedto determine the placental methylome for all loci, and not justlocations of fetal-specific alleles.

A. Method

FIG. 10 is a flowchart illustrating a method 1000 of determining a firstmethylation profile from a biological sample of an organism according toembodiments of the present invention. The biological sample (e.g.,plasma) includes cell-free DNA comprising a mixture of cell-free DNAoriginating from a first tissue and from a second tissue. The firstmethylation profile corresponds to a methylation profile of the firsttissue (e.g., fetal tissue or tumor tissue). Method 1200 can provide adeduction of differentially methylated regions from maternal plasma.

At block 1010, a biological sample is received. The biological samplecould simply be received at a machine (e.g., a sequencing machine). Thebiological sample may be in the form taken from the organism or may bein a processes form, e.g., the sample may be plasma that is extractedfrom a blood sample.

At block 1020, a second methylation profile corresponding to DNA of thesecond tissue is obtained. The second methylation profile could be readfrom memory, as it may have been determined previously. The secondmethylation profile can be determined from the second tissue, e.g., adifferent sample that contains only or predominantly cells of the secondtissue. The second methylation profile can correspond to a cellularmethylation profile and be obtained from cellular DNA. As anotherexample, the second profile can be determined from a plasma collectedbefore pregnancy, or before development of cancer because the plasmamethylome of a non-pregnant person without cancer is very similar to themethylome of blood cells.

The second methylation profile can provide a methylation density at eachof a plurality of loci in a genome of the organism. The methylationdensity at a particular locus corresponds to a proportion of DNA of thesecond tissue that is methylated. In one embodiment, the methylationdensity is a CpG methylation density, where CpG sites associated withthe locus are used to determine the methylation density. If there is onesite for a locus, then the methylation density can equal the methylationindex. The methylation density also corresponds to an unmethylationdensity as the two values are complementary.

In one embodiment, the second methylation profile is obtained byperforming methylation-aware sequencing of cellular DNA from a sample ofthe organism. One example of methylation-aware sequencing includestreating DNA with sodium bisulfite and then performing DNA sequencing.In another example, the methylation-aware sequencing can be performedwithout using sodium bisulfite, using a single molecule sequencingplatform that would allow the methylation status of DNA molecules(including N6-methyladenine, 5-methylcytosine and5-hydroxymethylcytosine) to be elucidated directly without bisulfiteconversion (A. B. Flusberg et al. 2010 Nat Methods; 7: 461-465; Shim Jet al. 2013 Sci Rep; 3:1389. doi: 10.1038/srep01389); or through theimmunoprecipitation of methylated cytosine followed by sequencing; orthrough the use of methylation-sensitive restriction enzymes followed bysequencing. In another embodiment, non-sequencing techniques are used,such as arrays and digital PCR.

In another embodiment, the second methylation density of the secondtissue could be obtained previously from control samples of the subjector from other subjects. The methylation density from another subject canact as a reference methylation profile having reference methylationdensities. The reference methylation densities can be determined frommultiple samples, where a mean level (or other statistical value) of thedifferent methylation densities at a locus can be used as the referencemethylation density at the locus.

At block 1030, a cell-free methylation profile is determined from thecell-free DNA of the mixture. The cell-free methylation profile providesa methylation density at each of the plurality of loci. The cell-freemethylation profile can be determined by receiving sequence reads from asequencing of the cell-free DNA, where the methylation information isobtained with the sequence reads. The cell-free methylation profile canbe determined in a same manner as the cellular methylome.

At block 1040, a percentage of the cell-free DNA from the first tissuein the biological sample is determined. In one embodiment, the firsttissue is fetal tissue, and the corresponding DNA is fetal DNA. Inanother embodiment, the first tissue is tumor tissue, and thecorresponding DNA is tumor DNA. The percentage can be determined in avariety of ways, e.g., using a fetal-specific allele or a tumor-specificallele. Copy number can also be used to determine the percentage, e.g.,as described in U.S. patent application Ser. No. 13/801,748 entitled“Mutational Analysis Of Plasma DNA For Cancer Detection” filed on Mar.13, 3013, which is incorporated by reference.

At block 1050, a plurality of loci for determining the first methylomeare identified. These loci may correspond to each of the loci used todetermine the cell-free methylation profile and the second methylationprofile. Thus, the plurality of loci may correspond. It is possible thatmore loci may be used to determine the cell-free methylation profile andthe second methylation profile.

In some embodiments, loci that were hypermethylated or hypomethylated inthe second methylation profile can be identified, e.g., using maternalblood cells. To identify the loci that were hypermethylated in thematernal blood cells, one can scan from one end of a chromosome for aCpG site with a methylation index≥80%. One can then search for the nextCpG site within the downstream 200-bp region. If the immediatelydownstream CpG site also had a methylation index≥80%, the first and thesecond CpG sites can be grouped. The grouping can continue until eitherthere were no other CpG site within the next downstream region of 200bp; or the immediately downstream CpG site had a methylation index <80%.The region of the grouped CpG sites can be reported as hypermethylatedin maternal blood cells if the region contained at least fiveimmediately adjacent hypermethylated CpG sites. A similar analysis canbe performed to search for loci that were hypomethylated in maternalblood cells for CpG sites with methylation indices ≤20%. The methylationdensities for the second methylation profile can calculated for theshort-listed loci and used to deduce the first methylation profile(e.g., placental tissue methylation density) of the corresponding loci,e.g., from maternal plasma bisulfite-sequencing data.

At block 1060, the first methylation profile of the first tissue isdetermined by calculating a differential parameter that includes adifference between the methylation density of the second methylationprofile and the methylation density of the cell-free methylation profilefor each of the plurality of loci. The difference is scaled by thepercentage.

In one embodiment, the first methylation density of a locus in the first(e.g., placental) tissue (D) was deduced using the equation:

$\begin{matrix}{D = {{mbc} - \frac{\left( {{mbc} - {mp}} \right)}{f*{CN}}}} & (1)\end{matrix}$

where mbc denotes the methylation density of the second methylationprofile at a locus (e.g., a short-listed locus as determined in thematernal blood cell bisulfite-sequencing data); mp denotes themethylation density of the corresponding locus in the maternal plasmabisulfite-sequencing data; f represented the percentage of cell-free DNAfrom the first tissue (e.g., fractional fetal DNA concentration), and CNrepresents copy number at the locus (e.g., a higher value foramplifications or a lower number for deletions relative to normal). Ifthere is no amplification or deletion in the first tissue then CN can beone. For trisomy (or a duplication of the region in a tumor or a fetus),CN would be 1.5 (as the increase is from 2 copies to 3 copies) andmonosomy would have 0.5. Higher amplification can increase by incrementsof 0.5. In this example, D can correspond to the differential parameter.

At block 1070, the first methylation density is transformed to obtain acorrected first methylation density of the first tissue. Thetransformation can account for fixed differences between thedifferential parameters and the actual methylation profile of the firsttissue. For example, the values may differ by a fixed constant or by aslope. The transformation can be linear or non-linear.

In one embodiment, the distribution of the deduced values, D, was foundto be lower than the actual methylation level of the placental tissue.For example, the deduced values can be linearly transformed using datafrom CpG islands, which were genomic segments that had anoverrepresentation of CpG sites. The genomic positions of CpG islandsused in this study were obtained from the UCSC Genome Brower database(NCBI build 36/hg18) (P. A. Fujita et al. 2011 Nucleic Acids Res; 39:D876-882). For example, a CpG island can be defined as a genomic segmentwith GC content ≥50%, genomic length >200 bp and the ratio ofobserved/expected CpG number≥0.6 (M. Gardiner-Garden et al 1987 J MolBiol; 196: 261-282).

In one implementation, to derive the linear transformation equation, CpGislands with at least 4 CpG sites and an average read depth≥5 per CpGsite in the sequenced samples can be included. After determining thelinear relationships between the methylation densities of CpG islands inthe CVS or term placenta and the deduced values, D, the followingequations were used to determine the predicted values:

First trimester predicted values=D×1.6+0.2

Third trimester predicted values=D×1.2+0.05

B. Fetal Example

As mentioned above, method 1000 can be used to deduce a methylationlandscape of the placenta from maternal plasma. Circulating DNA inplasma is predominately originated from hematopoietic cells. Still thereis an unknown proportion of cell-free DNA contributed from otherinternal organs. Moreover, placenta-derived cell-free DNA accounts forapproximately 5-40% of the total DNA in maternal plasma, with a mean ofapproximately 15%. Thus, one can make an assumption that the methylationlevel in maternal plasma is equivalent to an existing backgroundmethylation plus a placental contribution during pregnancy, as describedabove.

The maternal plasma methylation level, MP, can be determined using thefollowing equation:

MP=BKG×(1−f)+PLN×f

where BKG is the background DNA methylation level in plasma derived fromblood cells and internal organs, PLN is the methylation level ofplacenta and f is the fractional fetal DNA concentration in maternalplasma.

The methylation level of placenta can theoretically be deduced by:

$\begin{matrix}{{PLN} = \frac{{MP} - {{BKG} \times \left( {1 - f} \right)}}{f}} & (2)\end{matrix}$

Equations (1) and (2) are equivalent when CN equals one, D equals PLN,and BKG equals mbc.

The methylation level of maternal blood was taken to represent thebackground methylation of maternal plasma. Besides the loci that werehypermethylated or hypomethylated in maternal blood cells, we furtherexplored the deduction approach by focusing on defined regions withclinical relevance, for instance, CpG islands in the human genome.

The mean methylation density of a total of 27,458 CpG islands (NCBIBuild36/hg18) on the autosomes and chrX was derived from the sequencingdata of maternal plasma and placenta. Only those with ≥10 CpG sitescovered and an averaged read depth≥5 per covered CpG sites in allanalyzed samples, including the placenta, maternal blood and maternalplasma, were selected. As a result, 26,698 CpG islands (97.2%) remainedas valid and their methylation level was deduced using the plasmamethylation data and the fractional fetal DNA concentration according tothe above equation.

It was noticed that the distribution of deduced PLN values was lowerthan the actual methylation level of CpG islands in the placentaltissue. Thus, in one embodiment, the deduced PLN values, or simplydeduced values (D), were used as an arbitrary unit for estimating themethylation level of CpG islands in the placenta. After atransformation, the deduced values linearly and their distributionbecame more alike to the actual dataset. The transformed deduced valueswere named methylation predictive values (MPV) and subsequently used forpredicting the methylation level of genetic loci in the placenta.

In this example, the CpG islands were classified into 3 categories basedon their methylation densities in the placenta: Low (≤0.4), Intermediate(>0.4-<0.8) and High (≥0.8). Using the deduction equation, we calculatedthe MPV of the same set of CpG islands and then used the values toclassify them into 3 categories with the same cutoffs. By comparing theactual and the deduced datasets, we found that 75.1% of the short-listedCpG islands could be matched correctly to the same categories in thetissue data according to their MPS. About 22% of the CpG islands wereassigned to groups with 1-level difference (high versus intermediate, orintermediate versus low) and less than 3% would be completelymisclassified (high versus low) (FIG. 12A). The overall classificationperformance was also determined: 86.1%, 31.4% and 68.8% of CpG islandswith methylation densities ≤0.4, >0.4-<0.8 and ≥0.8 in the placenta werededuced to be “Low”, “Intermediate” and “High” correctly (FIG. 12B).

FIGS. 11A and 11B shows graphs of the performance of the predictingalgorithm using maternal plasma data and fractional fetal DNAconcentration according to embodiments of the present invention. FIG.11A is a graph 1100 showing the accuracy of CpG island classificationusing the MPV correction classification (the deduced category matchesexactly the actual dataset); 1-level difference (the deduced category is1-level different from the actual dataset); and misclassification (thededuced category is opposite to the actual dataset). FIG. 11B is a graph1150 showing the proportion of CpG islands classified in each deducedcategory.

Provided that the maternal background methylation is low in therespective genomic regions, the presence of hypermethylatedplacental-derived DNA in the circulation would increase the overallplasma methylation level to a degree depending on the fractional fetalDNA concentration. A marked change could be observed when the fetal DNAreleased is fully methylated. On the contrary, when the maternalbackground methylation is high, the degree of change in the plasmamethylation level would become more significant if hypomethylated fetalDNA is released. Therefore, the deduction scheme may be more practicalwhen the methylation level was deduced for genetic loci which are knownto be distinct between the maternal background and the placenta,especially for those hypermethylated and hypomethylated markers in theplacenta.

FIG. 12A is a table 1200 showing details of 15 selected genomic loci formethylation prediction according to embodiments of the presentinvention. To confirm techniques, we selected 15 differentiallymethylated genomic loci which were studied previously. The methylationlevels of selected regions were deduced and compared to previouslystudied 15 differentially methylated genetic loci (R. W. K. Chiu et al.2007 Am J Pathol; 170: 941-950; S. S. C. Chim et al. 2008 Clin Chem; 54:500-511; S. S. C. Chim et al. 2005 Proc Natl Acad Sci USA; 102:14753-14758; D. W. Y. Tsui et al. 2010 PLoS One; 5: e15069).

FIG. 12B is a graph 1250 showing the deduced categories of the 15selected genomic loci are provided with their corresponding methylationlevels in the placenta. Deduced methylation categories are: Low, ≤0.4;Intermediate, >0.4-<0.8; High, ≥0.8. Table 1200 and graph 1300 showsthat their methylation levels in placenta could be deduced correctlywith some exceptions: RASSF1A, CGI009, CGI137 and VAPA. Out of these 4markers, only CGI009 showed a marked discrepancy with the actualdataset. The others were just marginally misclassified.

In table 1200, “1” refers to the deduced values (D) being calculated bythe equation:

$D = \frac{{MP} - {{BKG} \times \left( {1 - f} \right)}}{f}$

where f is the fraction fetal DNA concentration. The label “2” refers tothe methylation predictive values (MPV) referring to the linearlytransformed deduced values using the equation: MPV=D×1.6+0.25. Label “3”refers to the classification cutoff for the deduced values: Low, ≤0.4;Inter(mediate), >0.4-<0.8; High, ≥0.8. Label “4” refers to theclassification cutoff for the actual placental dataset: Low, ≤0.4;Inter(mediate), >0.4-<0.8; High, ≥0.8. Label “5” denotes that placentalstatus refers to the methylation status of placenta relative to that ofmaternal blood cells.

C. Calculation of Fractional Concentrations of Fetal DNA

In one embodiment, the percentage of fetal DNA from the first tissue canuse a Y chromosome for a male fetus. The proportion of chromosome Y (%chrY) sequences in a maternal plasma sample was a composite of thechromosome Y reads derived from the male fetus and the number ofmaternal (female) reads that were misaligned to chromosome Y (R. W. K.Chiu et al. 2011 BMJ; 342: c7401). Thus, the relationship between % chrYand the fractional fetal DNA concentration (f) in the sample can begiven by:

% chrY=% chrY_(male) ×f+% chrY_(female)×(1−f)

where % chrY_(male) refers to a proportion of reads aligned tochromosome Y in a plasma sample containing 100% male DNA; and %chrY_(female) refers to the proportion of reads aligned to chromosome Yin a plasma sample containing 100% female DNA.

% chrY can be determined from reads that were aligned to chromosome Ywith no mismatches for a sample from a female pregnant with a malefetus, e.g., where the reads are from bisulfite-converted samples. The %chrY_(male) value can be obtained from the bisulfite-sequencing of twoadult male plasma samples. The % chrY_(female) value can be obtainedfrom the bisulfite-sequencing of two non-pregnant adult female plasmasamples.

In other embodiments, the fetal DNA percentage can be determined fromfetal-specific alleles on an autosome. As another example, epigeneticmarkers may be used to determine the fetal DNA percentage. Other ways ofdetermining the fetal DNA percentage may also be used.

D. Method of Using Methylation to Determine Copy Number

The placental genome is more hypomethylated than the maternal genome. Asdiscussed above the methylation of the plasma of a pregnant woman isdependent on the fractional concentration of placentally-derived fetalDNA in the maternal plasma. Therefore, through the analysis of themethylation density of a chromosomal region, it is possible to detectthe difference in the contribution of fetal tissues to the maternalplasma. For example, in a pregnant woman carrying a trisomic fetus (e.g.suffering from trisomy 21 or trisomy 18 or trisomy 13), the fetus wouldcontribute an additional amount of the DNA from the trisomic chromosometo the maternal plasma when compared with the disomic chromosomes. Inthis situation, the plasma methylation density for the trisomicchromosome (or any chromosomal region that has an amplification) wouldbe lower than those for the disomic chromosomes. The degree ofdifference can be predicted by mathematical calculation by taking intoaccount the fractional fetal DNA concentration in the plasma sample. Thehigher the fractional fetal DNA concentration in the plasma sample thelarger the difference in methylation density between the trisomic anddisomic chromosomes would be. For regions having a deletion, themethylation density would be higher.

From the previous discussion, the plasma methylation density for adisomic chromosome (MP_(Non-aneu)) can be calculated as:MP_(Non-aneu)=BKG×(1−f)+PLN×f,

where BKG is the background DNA methylation level in plasma derived fromblood cells and internal organs, PLN is the methylation level ofplacenta and f is the fractional fetal DNA concentration in maternalplasma.

The plasma methylation density for a trisomic chromosome (MP_(Aneu)) canbe calculated as: MP_(Aneu)=BKG× (1−f)+PLN× f×1.5, where the 1.5corresponds to the copy number CN and the addition of one morechromosome is a 50% increase. The difference between a trisomic anddisomic chromosomes (MP_(Diff)) would be

MP_(Diff)=PLN×f×0.5.

In one embodiment, a comparison of the methylation density of thepotentially aneuploid chromosome (or chromosomal region) to one or moreother presumed non-aneuploid chromosome(s) or the overall methylationdensity of the genome can be used to effectively normalize the fetal DNAconcentration in the plasma sample. The comparison be via a calculationof a parameter (e.g., involving a ratio or a difference) between themethylation densities of the two regions to obtain a normalizedmethylation density. The comparison can remove a dependence of theresulting methylation level (e.g., determined as a parameter from thetwo methylation densities).

If the methylation density of the potentially aneuploid chromosome isnot normalized to the methylation density of one or more otherchromosome(s), or other parameters that reflect the fractionalconcentration of fetal DNA, the fractional concentration would be amajor factor affecting the methylation density in the plasma. Forexample, the plasma methylation density of chromosome 21 of a pregnantwoman carrying a trisomy 21 fetus with a fractional fetal DNAconcentration of 10% would be the same as that of a pregnant womancarrying a euploid fetus and the fractional fetal DNA concentration is15%, whereas a normalized methylation density would show a difference.

In another embodiment, the methylation density of the potentiallyaneuploid chromosome can be normalized to the fractional fetal DNAconcentration. For example, the following equation can be applied tonormalize the methylation density:MP_(Normalized)=MP_(non-normalized)+(BKG−PLN)×f, where MP_(Normalized)is the methylation density normalized with the fractional fetal DNAconcentration in the plasma, MP_(non-normalized) is the measuredmethylation density, BKG is the background methylation density frommaternal blood cells or tissues, PLN is the methylation density in theplacental tissues, and f is the fractional fetal DNA concentration. Themethylation densities of BKG and PLN could be based on reference valuespreviously established from maternal blood cells and placental tissuesobtained from healthy pregnancies. Different genetic and epigeneticmethods can be used for the determination of the fractional fetal DNAconcentration in the plasma sample, for example by the measurement ofthe percentage of sequence reads from the chromosome Y using massivelyparallel sequencing or PCR on non-bisulfite-converted DNA.

In one implementation, the normalized methylation density for apotentially aneuploid chromosome can be compared to a reference groupwhich consists of pregnant woman carrying euploid fetuses. The mean andSD of the normalized methylation density of the reference group can bedetermined. Then the normalized methylation density of the tested casecan be expressed as a z-score which indicates the number of SDs from themean of the reference group by:

${{z - {score}} = \frac{{MP}_{Normalized} - {Mean}}{SD}},$

where MP_(Normalized) is the normalized methylation density for thetested case, Mean is the mean of the normalized methylation density ofthe reference cases and SD is the standard deviation of the normalizedmethylation density of the reference cases. A cutoff, for examplez-score <−3, can be used to classify if a chromosome is significantlyhypomethylated and, hence, to determine if the aneuploidy status of thesample.

In another embodiment, the MP_(Diff) can be used as the normalizedmethylation density. In such an embodiment, PLN can be deduced, e.g.,using method 1000. In some implementations, a reference methylationdensity (which can be normalized using f) can be determined from amethylation level of a non-aneuploid region. For example, the Mean couldbe determined from one or more chromosomal regions of the same sample.The cutoff could be scaled by f, or just set to a level sufficient aslong as a minimum concentration exists.

Accordingly, a comparison of a methylation level for a region to acutoff can be accomplished in various ways. The comparison can involve anormalization (e.g., as described above), which may be performedequivalently on the methylation level or the cutoff value, depending onhow the values are defined. Thus, whether the determined methylationlevel of a region is statistically different than a reference level(determined from same sample or other samples) can be determined in avariety of ways.

The above analysis can be applied to the analysis of chromosomalregions, which can include a whole chromosome or parts of thechromosome, including contiguous or disjoint subregions of a chromosome.In one embodiment, the potentially aneuploid chromosome can be dividedinto a number of bins. The bins can be of the same or different sizes.The methylation density of each bin can be normalized to the fractionalconcentration of the sample or to the methylation density of one or morepresumed non-aneuploid chromosome(s) or the overall methylation densityof the genome. The normalized methylation density of each bin can thenbe compared with a reference group to determine if it is significantlyhypomethylated. Then the percentage of bins being significantlyhypomethylated can be determined. A cutoff, for examples more than 5%,10%, 15%, 20% or 30% of the bins being significantly hypomethylated canbe used to classify the aneuploidy status of the case.

When one is testing for an amplification or a deletion, one can comparethe methylation density to a reference methylation density, which may bespecific for a particular region being tested. Each region may have adifferent reference methylation density as methylation can vary fromregion to region, particularly depending on the size of the regions(e.g., smaller regions will show more variation).

As mentioned above, one or more pregnant women each carrying a euploidfetus can be used to define the normal range of the methylation densityfor a region of interest or a difference in methylation density betweentwo chromosomal regions. A normal range can also be determined for thePLN (e.g., by direct measurement or as deduced by method 1000). In otherembodiments, a ratio between two methylation densities can be used,e.g., of a potentially aneuploid chromosome and a non-aneuploidchromosome can be used for the analysis instead of their difference.This methylation analysis approach can be combined with sequence readcounting approach (R W K Chiu et al. 2008 Proc Natl Acad Sci USA;105:20458-20463) and approaches involving size analysis of plasma DNA(US patent 2011/0276277) to determine or confirm an aneuploidy.

The use of BKG can account for variations in the background betweensamples. For example, one female might have different BKG methylationlevels than another female, but a difference between the BKG and PLN canbe used across samples in such situations. The cutoff for differentchromosomal regions can be different, e.g., when a methylation densityof one region of the genome differs relative to another region of thegenome.

This approach can be generalized to detect any chromosomal aberrations,including deletion and amplification, in the fetal genome. In addition,the resolution of this analysis can be adjusted to the desired level,for example, the genome can be divided into 10 Mb, 5 Mb, 2 Mb, 1 Mb, 500kb, 100 kb bins. Hence, this technology can also be used for detectingsubchromosomal duplication or subchromosomal deletion. This technologywould thus allow a prenatal fetal molecular karyotype to be obtainednoninvasively. When used in this manner, this technology can be used incombination with the noninvasive prenatal testing methods that are basedon the counting of molecules (A. Srinivasan et al. 2013 Am J Hum Genet;92:167-176). In other embodiments, the size of the bins need not beidentical. For example, the size of the bins may be adjusted so thateach bin contains an identical number of CpG dinucleotides. In thiscase, the physical size of the bins would be different.

The equation can be rewritten to apply to different types of chromosomeaberrations as MP_(Diff)=(BKG−PLN)×f×0.5×CN. Here CN represents thenumber of copy number change at the affected region. CN equals to 1 forthe gain of 1 copy of a chromosome, 2 for the gain of 2 copies of achromosome and −1 for the loss of one of the two homologous chromosomes(e.g. for detecting fetal Turner syndrome in which a female fetus haslost one of the X chromosomes, leading to a XO karyotype). This equationneed not be changed when the size of the bins are changed. However, thesensitivity and specificity may reduce when smaller bin size is usedbecause a smaller number of CpG dinucleotides (or other nucleotidecombinations showing differential methylation between fetal DNA andmaternal DNA) would be present in smaller bins, leading to increasedstochastic variation in the measurement of methylation densities. In oneembodiment, the number of reads required can be determined by analyzingthe coefficient of variation of the methylation density and the desiredlevel of sensitivity.

To demonstrate the feasibility of this approach, we have analyzed theplasma samples from 9 pregnant women. In five pregnant women, each wascarrying a euploid fetus and the other four were each carrying a trisomy21 (T21) fetus. Three of the five euploid pregnancies were randomlyselected to form a reference group. The remaining two euploid pregnancycases (Eu1 and Eu2) and the four T21 cases (T21-1, T21-2, T21-3 andT21-4) were analyzed using this approach to test for a potential T21status. The plasma DNA was bisulfite-converted and sequenced using theIllumina HiSeq2000 platform. In one embodiment, the methylation densityof individual chromosomes were calculated. The difference in methylationdensity between chromosome 21 and the mean of the other 21 autosomes wasthen determined to obtain a normalized methylation density. The mean andSD of the reference group was used for the calculated of the z-score ofthe five test cases.

TABLE 1 Using a cutoff of <−3 for z-score to classify a sample to beT21, the classification of all the euploid and T21 cases were correct.Eu1 Eu2 T21-1 T21-2 T21-3 T21-4 z-score for MP_(Diff) −1.48 1.09 −4.46−5.30 −8.06 −5.69 between chr 21 and other autosomes

In another embodiment, the genome was divided into 1 Mb bins and themethylation density for each 1 Mb bin can be determined. The methylationdensity of all the bins on the potentially aneuploid chromosome can benormalized with the median methylation density of all the bins locatedon the presumed non-aneuploid chromosomes. In one implementation, foreach bin, the difference in methylation density from the median of thenon-aneuploid bins can be calculated. Then z-score can be calculated forthese values using the mean and SD values of the reference group.

TABLE 2 Using 5% as a cutoff for the bins with significantly morehypomethylated on chromosome 21, all the cases were classified correctlyfor T21 status. Eu1 Eu2 T21-1 T21-2 T21-3 T21-4 Percentage of 0% 0%33.3% 58.3% 19.4% 52.8% bins on chr 21 have a z-score of MP_(Diff) <−3

This DNA methylation-based approach for detecting fetal chromosomal orsubchromosomal aberrations can be used in conjunction with those basedon the counting of molecules such as by sequencing (R. W. Chiu et al.2008 Proc Natl Acad Sci USA; 105: 20458-20463) or digital PCR (Y. M. Loet al. 2007 Proc Natl Acad Sci USA; 104: 13116-13121), or the sizing ofDNA molecules (US Patent Publication 2011/0276277). Such combination(e.g. DNA methylation plus molecular counting, or DNA methylation plussizing, or DNA methylation plus molecular counting plus sizing) wouldhave a synergistic effect which would be advantageous in a clinicalsetting, e.g. improving the sensitivity and specificity. For example,the number of DNA molecules that would need to be analyzed, e.g. bysequencing, can be reduced without adversely impacting the diagnosticaccuracy. This feature would allow such tests to be done moreeconomically. As another example, for a given number of DNA moleculesanalyzed, a combined approach would allow fetal chromosomal orsubchromosomal aberrations to be detected at a lower fractionalconcentration of fetal DNA.

FIG. 13 is a flowchart of a method 1300 for detecting a chromosomalabnormality from a biological sample of an organism. The biologicalsample includes cell-free DNA comprising a mixture of cell-free DNAoriginating from a first tissue and from a second tissue. The firsttissue may be from a fetus or tumor and the second tissue may be from apregnant female or a patient.

At block 1310, a plurality of DNA molecules from the biological sampleare analyzed. The analysis of a DNA molecule can include determining alocation of the DNA molecule in a genome of the organism and determiningwhether the DNA molecule is methylated at one or more sites. Theanalysis can be performed by receiving sequence reads from amethylation-aware sequencing, and thus the analysis can be performedjust on data previously obtained from the DNA. In other embodiments, theanalysis can include the actual sequencing or other active steps ofobtaining the data.

The determining of a location can include mapping the DNA molecules(e.g., via sequence reads) to respective parts of the human genome,e.g., to specific regions. In one implementation, if a read does not mapto a region of interest, then the read can be ignored.

At block 1320, a respective number of DNA molecules that are methylatedat the site is determined for each of a plurality of sites. In oneembodiment, the sites are CpG sites, and may be only certain CpG sites,as selected using one or more criteria mentioned herein. The number ofDNA that are methylated is equivalent to determining the number that areunmethylated once normalization is performed using a total number of DNAmolecules analyzed at a particular site, e.g., a total number ofsequence reads.

At block 1330, a first methylation level of a first chromosomal regionis calculated based on the respective numbers of DNA moleculesmethylated at sites within the first chromosomal region. The firstchromosomal region can be of any size, e.g., sizes mentioned above. Themethylation level can account for a total number of DNA moleculesaligned to the first chromosomal region, e.g., as part of anormalization procedure.

The first chromosomal region may be of any size (e.g., a wholechromosome) and may be composed of disjoint subregions, i.e., subregionsare separated from each other. Methylation levels of each subregion canbe determined and the combined, e.g., as an average or median, todetermine a methylation level for the first chromosomal region.

At block 1340, the first methylation level is compared to a cutoffvalue. The cutoff value may be a reference methylation level or berelated to a reference methylation level (e.g., a specified distancefrom a normal level). The cutoff value may be determined from otherfemale pregnant subjects carrying fetuses without a chromosomalabnormality for the first chromosomal region, from samples ofindividuals without cancer, or from loci of the organism that are knownto not be associated with an aneuploidy (i.e., regions that aredisomic).

In one embodiment, the cutoff value can be defined as having adifference from a reference methylation level of (BKG−PLN)× f×0.5× CN,where BKG is the background of the female (or an average or median fromother subjects), f is the concentration of cell-free DNA originatingfrom the first tissue, and CN is a copy number being tested. CN is anexample of a scale factor corresponding to a type of abnormality(deletion or duplication). A cutoff for a CN of 1 can be used to testall amplifications initially, and then further cutoffs can be used todetermine the degree of amplification. The cutoff value can be based ona concentration of cell-free DNA originating from the first tissue usingother formula.

At block 1350, a classification of an abnormality for the firstchromosomal region is determined based on the comparison. Astatistically significant difference in levels can indicate increasedrisk of the fetus having a chromosomal abnormality. In variousembodiments, the chromosomal abnormality can be trisomy 21, trisomy 18,trisomy 13, Turner syndrome, or Klinefelter syndrome. Other examples area subchromosomal deletion, subchromosomal duplication, or DiGeorgesyndrome.

V. Determination of Markers

As noted above, certain parts of the fetal genome are methylateddifferently than the maternal genome. These differences can be commonacross pregnancies. The regions of different methylation can be used toidentify DNA fragments that are from the fetus.

A. Method to Determine DMRs from Placental Tissue and Maternal Tissue

The placenta has tissue-specific methylation signatures. Fetal-specificDNA methylation markers have been developed for maternal plasmadetection and for noninvasive prenatal diagnostic applications based onloci that are differentially methylated between placental tissues andmaternal blood cells (S. S. C. Chim et al. 2008 Clin Chem; 54: 500-511;E. A. Papageorgiou et al 2009 Am J Pathol; 174: 1609-1618; and T. Chu etal. 2011 PLoS One; 6: e14723). Embodiments for mining for suchdifferentially methylated regions (DMRs) on a genome-wide basis areprovided.

FIG. 14 is a flowchart of a method 1400 for identifying methylationmarkers by comparing a placental methylation profile to a maternalmethylation profile (e.g., determined from blood cells) according toembodiments of the present invention. Method 1400 may also be used todetermine markers for a tumor by comparing a tumor methylation profileto a methylation profile corresponding to healthy tissue.

At block 1410, a placental methylome and a blood methylome is obtained.The placental methylome can be determined from a placental sample, e.g.,CVS or a term placenta. Methylome should be understood to possibleinclude methylation densities of only part of a genome.

At block 1420, a region is identified that includes a specified numberof sites (e.g., 5 CpG sites) and for which a sufficient number of readshave been obtained. In one embodiment, the identification began from oneend of each chromosome to locate the first 500-bp region that containedat least five qualified CpG sites. A CpG site may be deemed qualified ifthe site was covered by at least five sequence reads.

At block 1430, a placental methylation index and a blood methylationindex is calculated for each site. For example, the methylation indexwas calculated individually for all qualified CpG sites within each500-bp region.

At block 1440, the methylation indices were compared between thematernal blood cells and the placental sample to determine if the setsof indices were different between each other. For example, themethylation indices were compared between the maternal blood cells andthe CVS or the term placenta using, for example, the Mann-Whitney test.A P-value of, for example, ≤0.01 was considered as statisticallysignificantly different, although other values may be used, where alower number would reduce false positive regions.

In one embodiment, if the number of qualified CpG sites was less thanfive or the Mann-Whitney test was non-significant, the 500-bp regionshifted downstream for 100 bp. The region continued to be shifteddownstream until the Mann-Whitney test became significant for a 500-bpregion. The next 500-bp region would then be considered. If the nextregion was found to exhibit statistical significance by the Mann-Whitneytest, it would be added to the current region as long as the combinedcontiguous region is no larger than 1,000 bp.

At block 1450, adjacent regions that were statistically significantlydifferent (e.g., by the Mann-Whitney test) can be merged. Note thedifference is between the methylation indices for the two samples. Inone embodiment, if the adjacent regions are within a specified distance(e.g., 1,000 bp) of each other and if they showed a similar methylationprofile then they would be merged. In one implementation, the similarityof the methylation profile between adjacent regions can be defined usingany of the following: (1) showing the same trend in the placental tissuewith reference to the maternal blood cells, e.g. both regions were moremethylated in the placental tissues than the blood cells; (2) withdifferences in methylation densities of less than 10% for the adjacentregions in the placental tissue; and (3) with differences in methylationdensities of less than 10% for the adjacent regions in the maternalblood cells.

At block 1460, methylation densities of the blood methylome frommaternal blood cell DNA and placental sample (e.g., CVS or termplacental tissue) at the regions were calculated. The methylationdensities can be determined as described herein.

At block 1470, putative DMRs where total placental methylation densityand a total blood methylation density for all the sites in the regionare statistically significantly different is determined. In oneembodiment, all qualified CpG sites within a merged region are subjectedto a χ² test. The χ² test assessed if the number of methylated cytosinesas a proportion of the methylated and unmethylated cytosines among allthe qualified CpG sites within the merged region was statisticallysignificantly different between the maternal blood cells and placentaltissue. In one implementation, for the χ² test, a P-value of ≤0.01 maybe considered as statistically significantly different. The mergedsegments that showed significance by the χ² test were considered asputative DMRs.

At block 1480, loci where the methylation densities of the maternalblood cell DNA were above a high cutoff or below a low cutoff wereidentified. In one embodiment, loci were identified where themethylation densities of the maternal blood cell DNA were either ≤20% or≥80%. In other embodiments, bodily fluids other than maternal blood canbe used, including, but not limited to saliva, uterine or cervicallavage fluid from the female genital tract, tears, sweat, saliva, andurine.

A key to the successful development of DNA methylation markers that arefetal-specific in maternal plasma can be that the methylation status ofthe maternal blood cells are either as highly methylated or asunmethylated as possible. This can reduce (e.g., minimize) the chance ofhaving maternal DNA molecules interfering with the analysis of theplacenta-derived fetal DNA molecules which show an opposite methylationprofile. Thus, in one embodiment, candidate DMRs were selected byfurther filtering. The candidate hypomethylated loci were those thatshowed methylation densities ≤20% in the maternal blood cells and withat least 20% higher methylation densities in the placental tissues. Thecandidate hypermethylated loci were those that showed methylationdensities ≥80% in the maternal blood cells and with at least 20% lowermethylation densities in the placental tissues. Other percentages may beused.

At block 1490, DMRs were then identified among the subset of loci wherethe placental methylation densities are significantly different from theblood methylation densities by comparing the difference to a threshold.In one embodiment, the threshold is 20%, so the methylation densitiesdiffered by at least 20% from the methylation densities of the maternalblood cells. Accordingly, a difference between placental methylationdensities and blood methylation densities at each identified loci can becalculated. The difference can be a simple subtraction. In otherembodiments, scaling factors and other functions can be used todetermine the difference (e.g., the difference can be the result of afunction applied to the simple subtraction).

In one implementation, using this method, 11,729 hypermethylated and239,747 hypomethylated loci were identified from the first trimesterplacental sample. The top 100 hypermethylated loci are listed in tableS2A of the appendix. The top 100 hypomethylated loci are listed in tableS2B of the appendix. The tables list the chromosome, the start and endlocation, the size of the region, the methylation density in maternalblood, the methylation density in the placenta sample, the P-values(which are all very small), and the methylation difference. Thelocations correspond to reference genome hg18, which can be found athgdownload.soe.ucsc.edu/goldenPath/hg18/chromosomes.

11,920 hypermethylated and 204,768 hypomethylated loci were identifiedfrom the third trimester placental sample. The top 100 hypermethylatedloci for the 3^(rd) trimester are listed in table S2C, and the top 100hypomethylated loci are listed in table S2D. Thirty-three loci that werepreviously reported to be differentially methylated between maternalblood cells and first trimester placental tissues were used to validateour list of first trimester candidates. 79% of the 33 loci had beenidentified as DMRs using our algorithm.

FIG. 15A is a table 1500 showing a performance of first trimester DMRidentification algorithm using placental methylome with reference to 33previously reported first trimester markers. In the table, “a” indicatesthat loci 1 to 15 were previously described in (R. W. K. Chiu et al.2007 Am J Pathol; 170:941-950 and S. S. C. Chim et al. 2008 Clin Chem;54:500-511); loci 16 to 23 were previously described in (K. C. Yuen,thesis 2007, The Chinese University of Hong Kong, Hong Kong); and loci24 to 33 were previously described in (E. A. Papageorgiou et al. 2009 AmJ Pathol; 174:1609-1618). “b” indicates that these data were derivedfrom the above publications. “c” indicates that methylation densities ofmaternal blood cells and chorionic villus sample and their differenceswere observed from the sequencing data generated in the present studybut based on the genomic coordinates provided by the original studies.“d” indicates that data on the loci identified using embodiments ofmethod 1400 on the bisulfite sequencing data without taking referencefrom the publications cited above by Chiu et al (2007), Chim et al(2008), Yuen (2007) and Papageorgiou et al (2009). The span of the lociincluded the previously reported genomic regions but in general spannedlarger regions. “e” indicates that a candidate DMR was classified astrue-positive (TP) or false-negative (FN) based on the requirement ofobserving >0.20 difference between the methylation densities of thecorresponding genome coordinates of the DMRs in maternal blood cells andchorionic villus sample.

FIG. 15B is a table 1550 showing a performance of third trimester DMRidentification algorithm using the placental methylome measured usingthe placenta sample obtained at delivery. “a” indicates that the samelist of 33 loci as described in FIG. 17A were used. “b” indicates thatas the 33 loci were previously identified from early pregnancy samples,they might not be applicable to the third trimester data. Hence, thebisulfite sequencing data generated in the present study on the termplacental tissue based on the genomic coordinates provided by theoriginal studies were reviewed. A difference of >0.20 in the methylationdensities between the maternal blood cell and term placental tissue wasused to determine if the loci were indeed true DMRs in the thirdtrimester. “c” indicates that the data on the loci was identified usingmethod 1400 on the bisulfite sequencing data without taking referencefrom previously cited publications by Chiu et al (2007), Chim et al(2008), Yuen (2007) and Papageorgiou et al (2009). The span of the lociincluded the previously reported genomic regions but in general spannedlarger regions. “d” indicates that candidate DMRs that contained lociwhich qualified as differentially methylated in the third trimester wereclassified as true-positive (TP) or false-negative (FN) based on therequirement of observing >0.20 difference between the methylationdensities of the corresponding genome coordinates of the DMRs inmaternal blood cells and term placental tissue. For loci that did notqualify as differentially methylated in the third trimester, theirabsence in the DMR list or the presence of a DMR containing the loci butshowing methylation difference of <0.20 was considered as true negative(TN) DMRs.

B. DMRs from the Maternal Plasma Sequencing Data

One should be able to identify placental tissue DMRs directly from thematernal plasma DNA bisulfite-sequencing data provided that thefractional fetal DNA concentration of the sample was also known. It ispossible because the placenta is the predominant source of fetal DNA inmaternal plasma (S. S. Chim et al. 2005 Proc Natl Acad Sci USA 102,14753-14758) and we showed in this study that the methylation status offetal-specific DNA in maternal plasma correlated with the placentalmethylome.

Therefore, aspects of method 1400 may be implemented using a plasmamethylome to determine a deduced placental methylome instead of using aplacental sample. Thus, method 1000 and method 1400 can be combined todetermine DMRs. Method 1000 can be used to determine the predictedvalues for the placental methylation profile and use them in method1400. For this analysis, the example also focuses on loci that wereeither ≤20% or ≥80% methylated in the maternal blood cells.

In one implementation, to deduce loci that were hypermethylated in theplacental tissues with respect to maternal blood cells, we sorted forloci that showed ≤20% methylation in maternal blood cells, and ≥60%methylation according to the predicted value with a difference of atleast 50% between the blood cell methylation density and the predictedvalue. To deduce loci that were hypomethylated in the placental tissueswith respect to maternal blood cells, we sorted for loci that showed≥80% methylation in maternal blood cells, and ≤40% methylation accordingto the predicted value with a difference of at least 50% between theblood cell methylation density and the predicted value.

FIG. 16 is a table 1600 showing the numbers of loci predicted to behypermethylated or hypomethylated based on direct analysis of thematernal plasma bisulfite-sequencing data. “N/A” means not applicable.“a” indicates that the search for hypermethylated loci started from thelist of loci showing methylation densities <20% in the maternal bloodcells. “b” indicates that the search for hypomethylated loci startedfrom the list of loci showing methylation densities >80% in the maternalblood cells. “c” indicates that bisulfite-sequencing data from thechorionic villus sample was used for verifying the first trimestermaternal plasma data, and the term placental tissue was used forverifying the third trimester maternal plasma data.

As shown in table 1600, a majority of the noninvasively deduced locishowed the expected methylation pattern in the tissues and overlappedwith the DMRs mined from the tissue data and presented in the earliersection. The appendix lists DMRs identified from the plasma. Table S3Alists the top 100 loci deduced to be hypermethylated from the firsttrimester maternal plasma bisulfite-sequencing data. Table S3B lists thetop 100 loci deduced to be hypomethylated from the first trimestermaternal plasma bisulfite-sequencing data. Table S3C lists the top 100loci deduced to be hypermethylated from the third trimester maternalplasma bisulfite-sequencing data. Table S3D lists the top 100 locideduced to be hypomethylated from the third trimester maternal plasmabisulfite-sequencing data.

C. Gestational Variation in Placental and Fetal Methylomes

The overall proportion of methylated CpGs in the CVS was 55% while itwas 59% for the term placenta (table 100 of FIG. 1). More hypomethylatedDMRs could be identified from CVS than the term placenta while thenumber of hypermethylated DMRs was similar for the two tissues. Thus, itwas evident that the CVS was more hypomethylated than the term placenta.This gestational trend was also apparent in the maternal plasma data.The proportion of methylated CpGs among the fetal-specific reads was47.0% in the first trimester maternal plasma but was 53.3% in the thirdtrimester maternal plasma. The numbers of validated hypermethylated lociwere similar in the first (1,457 loci) and third trimester (1,279 loci)maternal plasma samples but there were substantially more hypomethylatedloci in the first (21,812 loci) than the third trimester (12,677 loci)samples (table 1600 of FIG. 16).

D. Use of Markers

The differentially methylated markers, or DMRs, are useful in severalaspects. The presence of such markers in maternal plasma indicates andconfirms the presence or fetal or placental DNA. This confirmation canbe used as a quality control for noninvasive prenatal testing. DMRs canserve as generic fetal DNA markers in maternal plasma and haveadvantages over markers that rely on genotyping differences between themother and fetus, such as polymorphism based markers or those based onchromosome Y. DMRs are generic fetal markers that are useful for allpregnancies. The polymorphism based markers are only applicable to thesubset of pregnancies where the fetus has inherited the marker from itsfather and where the mother does not possess this marker in her genome.In addition, one could measure the fetal DNA concentration in a maternalplasma sample by quantifying the DNA molecules originating from thoseDMRs. By knowing the profile of DMRs expected for normal pregnancies,pregnancy-associated complications, particularly those involvingplacental tissue changes, could be detected by observing a deviation inthe maternal plasma DMR profile or methylation profile from thatexpected for normal pregnancies. Pregnancy-associated complications thatinvolve placental tissue changes include but not limited to fetalchromosomal aneuploidies, such as trisomy 21, preeclampsia, intrauterinegrowth retardation and preterm labor.

E. Kits Using Markers

Embodiments can provide compositions and kits for practicing the methodsdescribed herein and other applicable methods. Kits can be used forcarrying out assays for analyzing fetal DNA, e.g., cell-free fetal DNAin maternal plasma. In one embodiment, a kit can include at least oneoligonucleotide useful for specific hybridization with one or more lociidentified herein. A kit can also include at least one oligonucleotideuseful for specific hybridization with one or more reference loci. Inone embodiment, placental hypermethylated markers are measured. The testlocus may be the methylated DNA in maternal plasma and the referencelocus may be the methylated DNA in maternal plasma. A similar kit couldbe composed for analyzing tumor DNA in plasma.

In some cases, the kits may include at least two oligonucleotide primersthat can be used in the amplification of at least a section of a targetlocus (e.g., a locus in the appendix) and a reference locus. Instead ofor in addition to primers, a kit can include labeled probes fordetecting a DNA fragment corresponding to a target locus and a referencelocus. In various embodiments, one or more oligonucleotides of the kitcorrespond to a locus in the tables of the appendix. Typically, the kitsalso provide instruction manuals to guide users in analyzing testsamples and assessing the state of physiology or pathology in a testsubject.

In various embodiments, a kit for analyzing fetal DNA in a biologicalsample containing a mixture of fetal DNA and DNA from a female subjectpregnant with a fetus is provided. The kit may comprise one or moreoligonucleotides for specifically hybridizing to at least a section of agenomic region listed in tables S2A, S2B, S2C, S2D, S3A, S3B, S3C, andS3D. Thus, any number of oligonucleotides from across the tables arejust from one table may be used. The oligonucleotides may act asprimers, and may be organized as pairs of primers, where a paircorresponds to a particular region from the tables.

VI. Relationship of Size and Methylation Density

Plasma DNA molecules are known to exist in circulation in the form ofshort molecules, with the majority of molecules about 160 bp in length(Y. M. D. Lo et al. 2010 Sci Transl Med; 2: 61ra91, Y. W. Zheng at al.2012 Clin Chem; 58: 549-558). Interestingly, our data revealed arelationship between the methylation status and the size of plasma DNAmolecules. Thus, plasma DNA fragment length is linked to DNA methylationlevel. The characteristic size profiles of plasma DNA molecules suggestthat the majority are associated with mononucleosomes, possibly derivedfrom enzymatic degradation during apoptosis.

Circulating DNA is fragmented in nature. In particular, circulatingfetal DNA is shorter than maternally-derived DNA in maternal plasmasamples (KCA Chan et al. 2004 Clin Chem; 50: 88-92). As paired-endalignment enables the size analysis of bisulfite-treated DNA, one couldassess directly if any correlation exists between the size of plasma DNAmolecules and their respective methylation levels. We explored this inthe maternal plasma as well as a non-pregnant adult female controlplasma sample.

Paired-end sequencing for both ends of each DNA molecule was used toanalyze each sample in this study. By aligning the pair of end sequencesof each DNA molecule to the reference human genome and noting the genomecoordinates of the extreme ends of the sequenced reads, one candetermine the lengths of the sequenced DNA molecules. Plasma DNAmolecules are naturally fragmented into small molecules and thesequencing libraries for plasma DNA are typically prepared without anyfragmentation steps. Hence, the lengths deduced by the sequencingrepresented the sizes of the original plasma DNA molecules.

In a previous study, we determined the size profiles of the fetal andmaternal DNA molecules in maternal plasma (Y. M. D. Lo et al. 2010 SciTransl Med; 2: 61ra91). We showed that the plasma DNA molecules hadsizes that resembled mononucleosomes and fetal DNA molecules wereshorter than the maternal ones. In this study, we have determined therelationship of the methylation status of plasma DNA molecules had totheir sizes.

A. Results

FIG. 17A is a plot 1700 showing size distribution of maternal plasma,non-pregnant female control plasma, placental and peripheral blood DNA.For the maternal sample and the non-pregnant female control plasma, thetwo bisulfite-treated plasma samples displayed the same characteristicsize distribution as previously reported (Y. M. D. Lo et al. 2010 SciTransl Med; 2: 61ra91) with the most abundant total sequences of 166-167bp in length and a 10-bp periodicity of DNA molecules shorter than 143bp

FIG. 17B is a plot 1750 of size distribution and methylation profile ofmaternal plasma, adult female control plasma, placental tissue and adultfemale control blood. For DNA molecules of the same size and containingat least one CpG site, their mean methylation density was calculated. Wethen plotted the relationship between the sizes of the DNA molecules andtheir methylation densities. Specifically, the mean methylation densitywas determined for each fragment length ranging from 50 bp up to 180 bpfor sequenced reads covering at least 1 CpG site. Interestingly, themethylation density increased with the plasma DNA size and peaked ataround 166-167 bp. This pattern, however, was not observed in theplacenta and control blood DNA samples which were fragmented using anultrasonicator system.

FIG. 18 shows plots of methylation densities and size of plasma DNAmolecules. FIG. 18A is a plot 1800 for the first trimester maternalplasma. FIG. 18B is a plot 1850 for the third trimester maternal plasma.Data for all the sequenced reads that covered at least one CpG site arerepresented by the blue curve 1805. Data for reads that also contained afetal-specific SNP allele are represented by the red curve 1810. Datafor reads that also contained a maternal-specific SNP allele arerepresented by the green curve 1815.

Reads that contained a fetal-specific SNP allele was considered a fetalDNA molecule. Reads that contained a maternal-specific SNP allele wasconsidered a maternal DNA molecule. In general, DNA molecules with highmethylation densities were longer in size. This trend was present inboth the fetal and maternal DNA molecules in both the first and thirdtrimesters. The overall sizes of the fetal DNA molecules were shorterthan the maternal ones as previously reported.

FIG. 19A shows a plot 1900 of methylation densities and the sizes ofsequenced reads for an adult non-pregnant female. The plasma DNA samplefrom the adult non-pregnant female also showed the same relationshipbetween the sizes and methylation state of the DNA molecules. On theother hand, the genomic DNA samples were fragmented by anultrasonication step before MPS analysis. As shown in plot 1900, thedata from the blood cell and placental tissue samples did not reveal thesame trend. Since the fragmentation of the cells is artificial, onewould expect to have no relationship of size and density. Since thenaturally fragmented DNA molecules in plasma do show a dependence onsize, it can be presumed that the lower methylation densities make itmore likely for molecules to break into smaller fragments.

FIG. 19B is a plot 1950 showing size distribution and methylationprofile of fetal-specific and maternal-specific DNA molecules inmaternal plasma. Fetal-specific and maternal-specific plasma DNAmolecules also exhibited the same correlation between fragment size andmethylation level. Both the fragment length of placenta-derived andmaternal circulating cell-free DNA increased with the methylation level.Moreover, the distribution of their methylation status did not overlapwith each other, suggesting that the phenomenon exists irrespective ofthe original fragment length of the sources of circulating DNAmolecules.

B. Method

Accordingly, a size distribution can be used to estimate a totalmethylation percentage of a plasma sample. This methylation measurementcan then be tracked during pregnancy or during cancer treatment byserial measure of the size distributions of the plasma DNA according tothe relationship shown in FIGS. 18A and 18B. The methylation measurementcan also be used to look for increased or decreased release of DNA froman organ or a tissue of interest. For example, one can specifically lookfor DNA methylation signatures specific to a specific organ (e.g. theliver) and to measure the concentrations of these signatures in plasma.As DNA is released into plasma when cells die, an increase in levelscould mean an increase in cell death or damage in that particular organor tissue. A decrease in level from a particular organ can mean thattreatment to counter damage or pathological processes in that organ isunder control.

FIG. 20 is a flowchart of a method 2000 for estimating a methylationlevel of DNA in a biological sample of an organism according toembodiments of the present invention. The methylation level can beestimated for a particular region of a genome or the entire genome. If aspecific region is desired, then DNA fragments only from that specificregion may be used.

At block 2010, amounts of DNA fragments corresponding to various sizesare measured. For each size of a plurality of sizes, an amount of aplurality of DNA fragments from the biological sample corresponding tothe size can be measured. For instance, the number of DNA fragmentshaving a length of 140 bases may be measured. The amounts may be savedas a histogram. In one embodiment, a size of each of the plurality ofnucleic acids from the biological sample is measured, which may be doneon an individual basis (e.g., by single molecule sequencing of a wholemolecule or just ends of the molecule) or on a group basis (e.g., viaelectrophoresis). The sizes may correspond to a range. Thus, an amountcan be for DNA fragments that have a size within a particular range.When paired-end sequencing is performed, the DNA fragments (asdetermined by the paired sequence reads) mapping (aligning) to aparticular region may be used to determine the methylation level of theregion.

At block 2020, a first value of a first parameter is calculated based onthe amounts of DNA fragments at multiple sizes. In one aspect, the firstparameter provides a statistical measure of a size profile (e.g., ahistogram) of DNA fragments in the biological sample. The parameter maybe referred to as a size parameter since it is determined from the sizesof the plurality of DNA fragments.

The first parameter can be of various forms. One parameter is thepercentage of DNA fragment of a particular size or range of sizesrelative to all DNA fragments or relative to DNA fragments of anothersize or range. Such a parameter is a number of DNA fragments at aparticular size divided by the total number of fragments, which may beobtained from a histogram (any data structure providing absolute orrelative counts of fragments at particular sizes). As another example, aparameter could be a number of fragments at a particular size or withina particular range divided by a number of fragments of another size orrange. The division can act as a normalization to account for adifferent number of DNA fragments being analyzed for different samples.A normalization can be accomplished by analyzing a same number of DNAfragments for each sample, which effectively provides a same result asdividing by a total number fragments analyzed. Additional examples ofparameters and about size analysis can be found in U.S. patentapplication Ser. No. 13/789,553, which is incorporated by reference forall purposes.

At block 2030, the first size value is compared to a reference sizevalue. The reference size value can be calculated from DNA fragments ofa reference sample. To determine the reference size values, themethylation profile can be calculated and quantified for a referencesample, as well as a value of the first size parameter. Thus, when thefirst size value is compared to the reference size value, a methylationlevel can be determined.

At block 2040, the methylation level is estimated based on thecomparison. In one embodiment, one can determine if the first value ofthe first parameter is above or below the reference size value, andthereby determine if the methylation level of the instant sample isabove or below the methylation level to the reference size value. Inanother embodiment, the comparison is accomplished by inputting thefirst value into a calibration function. The calibration function caneffectively compare the first value to calibration values (a set ofreference size values) by identifying the point on a curve correspondingto the first value. The estimated methylation level is then provided asthe output value of the calibration function.

Accordingly, one can calibrate a size parameter to a methylation level.For example, a methylation level can be measured and associated with aparticular size parameter for that sample. Then data points from varioussamples can be fit a calibration function. In one implementation,different calibration functions can be used for different subsets ofDNA. Thus, there may be some form of calibration based on priorknowledge about the relationship between methylation and size for aparticular subset of DNA. For example, the calibration for fetal andmaternal DNA could be different.

As shown above, the placenta is more hypomethylated when compared withmaternal blood, and thus the fetal DNA is smaller due to the lowermethylation. Accordingly, an average size of the fragments of a sample(or other statistical value) can be used to estimate the methylationdensity. As the fragment sizes can be measured using paired-endsequencing, rather than the potentially technically more complexmethylation-aware sequencing, this approach would potentially becost-effective if used clinically. This approach can be used formonitoring the methylation changes associated with the progress ofpregnancy, or with pregnancy-associated disorders such as preeclampsia,preterm labor and fetal disorders (such as those caused by chromosomalor genetic abnormalities or intrauterine growth retardation).

In another embodiment, this approach can be used for detecting andmonitoring cancer. For example, with the successful treatment of cancer,the methylation profile in plasma or another bodily fluid as measuredusing this size-based approach would change towards that of healthyindividuals without cancer. Conversely, in the event that the cancer isprogressing, then the methylation profile in plasma or another bodilyfluid would diverge from that of healthy individuals without cancer.

In summary, the hypomethylated molecules were shorter than thehypermethylated ones in plasma. The same trend was observed in both thefetal and maternal DNA molecules. Since DNA methylation is known toinfluence nucleosome packing, our data suggest that perhaps thehypomethylated DNA molecules were less densely packed with histones andwere therefore more susceptible to enzymatic degradation. On the otherhand, the data presented in FIGS. 18A and 18B also showed that despitethe fetal DNA being much more hypomethylated than the maternal reads,the size distribution of the fetal and maternal DNA does not separatefrom one another completely. In FIG. 19B, one can see that even for thesame size category, the methylation level of fetal- andmaternal-specific reads differ from one another This observationsuggests that the hypomethylated state of fetal DNA is not the onlyfactor that accounted for its relative shortness with reference to thematernal DNA.

VII. Imprinting Status of Gene Loci

Fetal-derived DNA molecules can be detected which share the samegenotype but with different epigenetic signatures as the mother inmaternal plasma (L. L. Poon et al. 2002 Clin Chem; 48: 35-41). Todemonstrate that the sequencing approach is sensitive in picking upfetal-derived DNA molecules in maternal plasma, we applied the samestrategy to detect the imprinted fetal alleles in maternal plasmasample. Two genomic imprinted regions were identified: H19(chr11:1,977,419-1,977,821, NCBI Build36/hg18) and MEST(chr7:129,917,976-129,920,347, NCBI Build36/hg18). Both of them containinformative SNPs for differentiation between the maternal and fetalsequences. For H19, a maternally expressed gene, the mother washomozygous (A/A) and the fetus was heterozygous (A/C) for the SNPrs2071094 (chr11:1,977,740) in the region. One of the maternal A allelesis fully methylated and the other is unmethylated. In the placenta,however, the A allele is unmethylated while the paternal-inherited Callele is fully methylated. We detected two methylated reads with the Cgenotype, corresponding to the imprinted paternal alleles derived fromthe placenta, in maternal plasma.

MEST, also known as PEG1, is a paternally expressed gene. Both themother and the fetus were heterozygous (A/G) for the SNP rs2301335(chr7:129,920,062) within the imprinted locus. The G allele ismethylated while the A allele is unmethylated in maternal blood. Themethylation pattern is reversed in the placenta with the maternal Aallele being methylated and the paternal G allele unmethylated. Threeunmethylated G alleles, which were paternally derived, were detectablein maternal plasma. In contrast, VAV1, a non-imprinted gene locus onchromosome 19 (chr19:6,723,621-6,724,121), did not display any allelicmethylation pattern in the tissue as well as in the plasma DNA samples.

Thus, methylation status can be used to determine which DNA fragmentsare from the fetus. For example, just detecting the A allele in maternalplasma cannot be used as a fetal marker when the mother is GAheterozygous. But if one distinguishes the methylation status of the Amolecules in plasma, the methylated A molecules are fetal-specific whilethe unmethylated A-molecules are maternal-specific, or vice versa.

We next focused on loci that have been reported to demonstrate genomicimprinting in placental tissues. Based on the list of loci reported byWoodfine et al. (2011 Epigenetics Chromatin; 4: 1), we further sortedfor those that contained SNPs within the imprinting control region. Fourloci fulfilled the criteria and they were H19, KCNQ10T1, MEST and NESP.

Regarding the reads of the maternal blood cell sample for H19 andKCNQ10T1, the maternal reads were homozygous for the SNP and there wereapproximately equal proportions of methylated and unmethylated reads.The CVS and term placental tissue sample revealed that the fetus washeterozygous for both loci and each allele was either exclusivelymethylated or unmethylated, i.e. showing monoallelic methylation. In thematernal plasma samples, the paternally inherited fetal DNA moleculeswere detected for both loci. For H19, the paternally inherited moleculeswere represented by the sequenced reads that contained thefetal-specific allele and were methylated. For KCNQ10T1, the paternallyinherited molecules were represented by the sequenced reads thatcontained the fetal-specific allele and were unmethylated.

On the other hand, the mother was heterozygous for both MEST and NESP.For MEST, both the mother and fetus were GA heterozygotes for the SNP.However, as evident from the data for the Watson strand for the maternalblood cells and placental tissue, the methylation status for the CpGsadjacent to the SNP was opposite in the mother and fetus. The A-allelewas unmethylated in the mother's DNA but methylated in the fetus's DNA.For MEST, the maternal allele was methylated. Hence, one could pinpointthat the fetus had inherited the A-allele from its mother (methylated inthe CVS) and the mother had inherited the A-allele from her father(unmethylated in the maternal blood cells). Interestingly, in thematernal plasma samples, all four groups of molecules could be readilydistinguished, including each of the two alleles of the mother and eachof the two alleles of the fetus. Thus, by combining the genotypeinformation with the methylation status at the imprinted loci, we couldreadily distinguish the maternally inherited fetal DNA molecules fromthe background maternal DNA molecules (L. L. Poon et al. 2002).

This approach could be used to detect uniparental disomy. For example,if the father of this fetus is known to be homozygous for the G-allele,the failure to detect the unmethylated G-allele in maternal plasmasignifies the lack of contribution of the paternal allele. In addition,under such a circumstance, when both methylated G-allele and methylatedA-allele were detected in the plasma of this pregnancy, it would suggestthat the fetus has heterodisomy from the mother, i.e. inheriting twodifferent alleles from the mother with no inheritance from the father.Alternatively, if both methylated A-allele (fetal allele inherited fromthe mother) and unmethylated A-allele (maternal allele inherited fromthe maternal grandfather) were detected in maternal plasma without theunmethylated G-allele (paternal allele that should have been inheritedby the fetus), it would suggest that the fetus has isodisomy from themother, i.e. inheriting two identical alleles from the mother with noinheritance from the father.

For NESP, the mother was a GA heterozygote at the SNP while the fetuswas homozygous for the G-allele. The paternal allele was methylated forNESP. In the maternal plasma samples, the paternally-inherited fetalG-alleles that were methylated could be readily distinguished from thebackground maternal G-alleles which were unmethylated.

VIII. Cancer/Donors

Some embodiments can be used for the detection, screening, monitoring(e.g. for relapse, remission, or response (e.g. presence or absence) totreatment), staging, classification (e.g. for aid in choosing the mostappropriate treatment modality) and prognostication of cancer usingmethylation analysis of circulating plasma/serum DNA.

Cancer DNA is known to demonstrate aberrant DNA methylation (J. G.Herman et al. 2003 N Engl J Med; 349: 2042-2054). For example, the CpGisland promoters of gene, e.g. tumor suppressor genes, arehypermethylated while the CpG sites in the gene body are hypomethylatedwhen compared with non-cancer cells. Provided that the methylationprofile of the cancer cells could be reflected by the methylationprofile of the tumor-derived plasma DNA molecules using methods hereindescribed, we expect that the overall methylation profile in plasmawould be different between individuals with cancer when compared withthose healthy individuals without cancer or when compared with thosewhose cancer had been cured. The types of differences in the methylationprofile could be in terms of quantitative differences in the methylationdensities of the genome and/or methylation densities of segments of thegenomes. For example, due to the general hypomethylated nature of DNAfrom cancer tissues (Gama-Sosa M A et al. 1983 Nucleic Acids Res; 11:6883-6894), reduction in methylation densities in the plasma methylomeor segments of the genome would be observed in plasma of cancerpatients.

Qualitative changes in the methylation profile should also be reflectedamong the plasma methylome data. For example, plasma DNA moleculesoriginating from genes that are hypermethylated only in cancer cellswould show hypermethylation in plasma of a cancer patient when comparedwith plasma DNA molecules originating from the same genes but in asample of a healthy control. Because aberrant methylation occurs in mostcancers, the methods herein described could be applied to the detectionof all forms of malignancies with aberrant methylation, for example,malignancies in, but not limited to, the lung, breast, colorectum,prostate, nasopharynx, stomach, testes, skin, nervous system, bone,ovary, liver, hematologic tissues, pancreas, uterus, kidney, lymphoidtissues, etc. The malignancies may be of a variety of histologicalsubtypes, for example, carcinomas, adenocarcinomas, sarcomas,fibroadenocarcinoma, neuroendocrine, undifferentiated.

On the other hand, we expect that tumor-derived DNA molecules can bedistinguished from the background non-tumor-derived DNA moleculesbecause the overall short size profile of tumor-derived DNA isaccentuated for DNA molecules originating from loci withtumor-associated aberrant hypomethylation which would have an additionaleffect on the size of the DNA molecule. Also, tumor-derived plasma DNAmolecules can be distinguished from the background non-tumor-derivedplasma DNA molecules using multiple characteristic features that areassociated with tumor DNA, including but not limited to singlenucleotide variants, copy number gains and losses, translocations,inversions, aberrant hyper- or hypo-methylation and size profiling. Asall of these changes could occur independently, the combined use ofthese features may provide additive advantage for the sensitive andspecific detection of cancer DNA in plasma.

A. Size and Cancer

The size of tumor-derived DNA molecules in plasma also resemble thesizes of mononucleosomal units and are shorter than the backgroundnon-tumor-derived DNA molecules, which co-exists in plasma of cancerpatients. Size parameters have been shown to be correlated with cancer,as described in U.S. patent application Ser. No. 13/789,553, which isincorporated by reference for all purposes.

Since both fetal-derived and maternal-derived DNA in plasma showed arelationship between the size and methylation status of the molecule,tumor-derived DNA molecules are expected to exhibit the same trend. Forexample, the hypomethylated molecules would be shorter than thehypermethylated molecules in the plasma of cancer patients or insubjects screened for cancer.

B. Methylation Densities of Different Tissues in a Cancer Patient

In this example, we analyzed the plasma and tissue samples of ahepatocellular carcinoma (HCC) patient. Blood samples were collectedfrom the HCC patient before and at 1 week after surgical resection ofthe tumor. Plasma and buffy coat were harvested after centrifugation ofthe blood samples. The resected tumor and the adjacent non-tumor livertissue were collected. The DNA samples extracted from the plasma andtissue samples were analyzed using massively parallel sequencing withand without prior bisulfite treatment. The plasma DNA from four healthyindividuals without cancer was also analyzed as controls. The bisulfitetreatment of a DNA sample would convert the unmethylated cytosineresidues to uracil. In the downstream polymerase chain reaction andsequencing, these uracil residues would behave as thymidine. On theother hand, the bisulfite treatment would not convert the methylatedcytosine residues to uracil. After massively parallel sequencing, thesequencing reads were analyzed by the Methy-Pipe (P. Jiang, et al.Methy-Pipe: An integrated bioinformatics data analysis pipeline forwhole genome methylome analysis, paper presented at the IEEEInternational Conference on Bioinformatics and Biomedicine Workshops,Hong Kong, 18 to 21 Dec. 2010), to determine the methylation status ofthe cytosine residues at all CG dinucleotide positions, i.e CpG sites.

FIG. 21A is a table 2100 showing the methylation densities of thepre-operative plasma and the tissue samples of an HCC patient. The CpGmethylation density for the regions of interest (e.g. CpG sites,promoter, or repeat regions etc.) refers to the proportion of readsshowing CpG methylation over the total number of reads covering genomicCpG dinucleotides. The methylation densities of the buffy coat and thenon-tumoral liver tissue are similar. The overall methylation density ofthe tumor tissue, based on data from all autosomes, was 25% lower thanthose of the buffy coat and the non-tumoral liver tissue. Thehypomethylation was consistent across each individual chromosome. Themethylation density of the plasma was between the values of thenon-malignant tissues and the cancer tissues. This observation isconsistent with the fact that both cancer and non-cancer tissues wouldcontribute to the circulating DNA of a cancer patient. It has been shownthat the hematopoietic system is the main source of the circulating DNAin individuals without an active malignant condition (Y. Y. Lui, et al.2002 Clin Chem; 48: 421-7). We therefore also analyzed plasma samplesobtained from four healthy controls. The number of sequence reads andthe sequencing depth achieved per sample are shown in table 2150 of FIG.21B.

FIG. 22 is a table 220 showing the methylation densities in theautosomes ranged from 71.2% to 72.5% in the plasma samples of thehealthy controls. These data showed the expected level of DNAmethylation in plasma samples obtained from individuals without a sourceof tumor DNA. In a cancer patient, the tumor-tissue would also releaseDNA into the circulation (K. C. Chan et al. 2013 Clin Chem; 59:211-224); R. J. Leary et al. 2012 Sci Transl Med; 4: 162ra154). Due tothe hypomethylated nature of the HCC tumor, the presence of both tumor-and non-tumor-derived DNA in the pre-operative plasma of the patientresulted in a reduction in the methylation density when compared withplasma levels of healthy controls. In fact, the methylation density ofthe pre-operative plasma sample was between the methylation densities ofthe tumor tissue and the plasma of the healthy controls. The reason isbecause the methylation level of the plasma DNA of cancer patients wouldbe influenced by the degree of degree of aberrant methylation,hypomethylation in this case, of the tumor tissue and the fractionalconcentration of the tumor-derived DNA in the circulation. A lowermethylation density of the tumor tissue and a higher fractionalconcentration of tumor-derived DNA in the circulation would lead to alower methylation density of the plasma DNA in a cancer patient. Mosttumors are reported to show global hypomethylation (J. G. Herman et al.2003 N Engl J Med; 349: 2042-2054; Gama-Sosa M A et al. 1983 NucleicAcids Res; 11: 6883-6894). Thus, the current observations seen in theHCC samples should also be applicable to other types of tumors.

In one embodiment, the methylation density of the plasma DNA can be usedto determine the fractional concentration of tumor-derived DNA in aplasma/serum sample when the methylation level of the tumor tissue isknown. The methylation level, e.g. methylation density, of the tumortissue can be obtained if the tumor sample is available or a biopsy ofthe tumor is available. In another embodiment, the information regardingthe methylation level of the tumor tissue can be obtained from survey ofthe methylation level in a group of tumors of a similar type and thisinformation (e.g. a mean level or a median level) is applied to thepatient to be analyzed using the technology described in this invention.The methylation level of the tumor tissue can be determined by theanalysis of the tumor tissue of the patient or inferred from theanalysis of the tumor tissues of other patients with the same or asimilar cancer type. The methylation of tumor tissues can be determinedusing a range of methylation-aware platforms, including but not limitedto massively parallel sequencing, single molecular sequencing,microarray (such as methylated cytosine immunoprecipitation ormethylation-aware restriction enzyme digestion followed by microarrayanalysis, or oligonucleotide arrays), or mass spectrometry (such as theEpityper, Sequenom, Inc., analysis). When the methylation level of atumor is known, the fractional concentration of tumor DNA in the plasmaof cancer patients could be calculated after plasma methylome analysis.

The relationship between the plasma methylation level, P, with thefractional tumor DNA concentration, f, and the tumor tissue methylationlevel, TUM, can be described as: P=BKG×(1−f)+TUM×f, where BKG is thebackground DNA methylation level in plasma derived from blood cells andother internal organs. For example, the overall methylation density ofall autosomes was shown to be 42.9% in the tumor biopsy tissue obtainedfrom this HCC patient, i.e. the TUM value for this case. The meanmethylation density of the plasma samples from the four healthy controlswas 71.6%, i.e. the BKG value of this case. The plasma methylationdensity for the pre-operative plasma was 59.7%. Using these values, f isestimated to be 41.5%

In another embodiment, the methylation level of the tumor tissue can beestimated noninvasively based on the plasma methylome data when thefractional concentration of the tumor-derived DNA in the plasma sampleis known. The fractional concentration of the tumor-derived DNA in theplasma sample can be determined by other genetic analysis, for examplethe genomewide analysis of allelic loss (GAAL) and the analysis ofsingle nucleotide mutations as previously described (U.S. patentapplication Ser. No. 13/308,473; Chan K C et al. 2013 Clin Chem; 59:211-24). The calculation is based on the same relationship describedabove except that in this embodiment, the value off is known and thevalue of TUM becomes the unknown. The deduction can be performed for thewhole genome or for parts of the genome, similar to the data observedfor the context of determining the placental tissue methylation levelfrom maternal plasma data.

In another embodiment, one can use the inter-bin variation or profile inthe methylation densities to differentiate subjects with cancer andthose without cancer. The resolution of the methylation analysis can befurther increased by dividing the genome into bins of a particular size,e.g., 1 Mb. In such an embodiment, the methylation density of each 1 Mbbin was calculated for the collected samples, e.g., buffy coat, theresected HCC tissue, the non-tumoral liver tissue adjacent to the tumorand the plasma collected before and after tumor resection. In anotherembodiment, the bin sizes do not need to be kept constant. In oneimplementation, the number of CpG sites is kept constant within each binwhile the bin itself can vary in size.

FIGS. 23A and 23B shows methylation density of buffy coat, tumor tissue,non-tumoral liver tissue, the pre-operative plasma and post-operativeplasma of the HCC patient. FIG. 23A is a plot 2300 of results forchromosome 1. FIG. 23B is a plot 2350 of results for chromosome 2.

For most of the 1 Mb windows, the methylation densities for the buffycoat and the non-tumoral liver tissue adjacent to the tumor were similarwhereas those of the tumor tissues were lower. Similar to the results inthe whole chromosome analyses as shown in Table 1, the methylationdensities of the pre-operative plasma lie between those of the tumor andthe non-malignant tissues. The methylation densities of the interrogatedgenomic regions in the tumor tissues could be deduced using themethylation data of the pre-operative plasma and the fractional tumorDNA concentration. The method is same as described above using themethylations density values of all the autosomes. The deduction of thetumor methylation described can also be performed using this higherresolution methylation data of the plasma DNA. Other bin sizes, such as300 kb, 500 kb, 2 Mb, 3 Mb, 5 Mb or more than 5 Mb can also be used. Inone embodiment, the bin sizes do not need to be kept constant. In oneimplementation, the number of CpG sites is kept constant within each binwhile the bin itself can vary in size.

C. Comparison of Plasma Methylation Density Between the Cancer Patientand Healthy Individuals

As shown in 2100, the methylation densities of the pre-operative plasmaDNA were lower than those of the non-malignant tissues in the cancerpatient. This is likely to result from the presence of DNA from thetumor tissue which was hypomethylated. This lower plasma DNA methylationdensity can potentially be used as a biomarker for the detection andmonitoring of cancer. For cancer monitoring, if a cancer is progressing,then there will be an increased amount of cancer-derived DNA in plasmawith time. In this example, an increased amount of circulatingcancer-derived DNA in plasma will lead to a further reduction in theplasma DNA methylation density on a genome wide level.

Conversely, if a cancer responds to treatment, then the amount ofcancer-derived DNA in plasma will decrease with time. In this example, adecrease in the amount of cancer-derived DNA in plasma will lead to anincrease in the plasma DNA methylation density. For example, if a lungcancer patient with epidermal growth factor receptor mutation has beentreated with a targeted therapy, e.g. tyrosine kinase inhibition, thenan increase in plasma DNA methylation density would signify a response.Subsequently, the emergence of a tumor clone resistant to tyrosinekinase inhibition would be associated with a decrease in plasma DNAmethylation density which would indicate a relapse.

Plasma methylation density measurements can be performed serially andthe rate of change of such measurements can be calculated and used topredict or correlate with clinical progression or remission orprognosis. For selected genomic loci which are hypermethylated in cancertissues but hypomethylated in normal tissues, e.g. the promoter regionsof a number of tumor suppressor genes, the relationship between cancerprogression and favorable response to treatment will be opposite to thepatterns described above.

To demonstrate the feasibility of this approach, we compared the DNAmethylation densities of plasma samples collected from the cancerpatient before and after surgical removal of the tumor with plasma DNAobtained from four healthy control subjects.

Table 2200 shows the DNA methylation densities of each autosome and thecombined values of all autosomes of the pre-operative and post-operativeplasma samples of the cancer patient and that of the four healthycontrol subjects. For all chromosomes, the methylation densities of thepre-operative plasma DNA sample was lower than the post-operative sampleand the plasma samples from the four healthy subjects. The difference inthe plasma DNA methylation densities between the pre-operative andpost-operative samples provided supportive evidence that the lowermethylation densities in the pre-operative plasma sample were due to thepresence of DNA from the HCC tumor.

The reversal of the DNA methylation densities in the post-operativeplasma sample levels similar to the plasma samples of the healthycontrols suggested that much of the tumor-derived DNA had disappeareddue to the surgical removal of the source, i.e. tumor. These datasuggest that the methylation density of the pre-operative plasma asdetermined using data available from a large genomic regions, such asall autosomes or individual chromosomes, was of a lower methylationlevel than that of the healthy controls to allow the identification,i.e. diagnosis or screening, of the test case as having cancer.

The data of the pre-operative plasma also showed much lower methylationlevel than that of the post-operative plasma indicating that the plasmamethylation level could also be used to monitor the tumor load, henceprognosticate and monitor the progress of cancer in the patient.Reference values can be determined from plasma of healthy controls orpersons at-risk for the cancer but currently without cancer. Persons atrisk for HCC include those with chronic hepatitis B or hepatitis Cinfection, those with hemochromatosis, and those with liver cirrhosis.

Plasma methylation density values beyond, for example lower than, adefined cutoff based on the reference values can be used to assess if anonpregnant person's plasma has tumor DNA or not. To detect the presenceof hypomethylated circulating tumor DNA, the cutoff can be defined aslower than the 5^(th) or 1^(st) percentiles of the values of the controlpopulation, or based on a number of standard deviations, for example, 2or 3 standard deviations (SDs), below the mean methylation densityvalues of the controls, or based on determining a multiple of the median(MoM). For hypermethylated tumor DNA, the cutoff can be defined ashigher than the 95^(th) or 99^(th) percentile of the values of thecontrol population, or based on a number of standard deviations, forexample, 2 or 3 SDs, above the mean methylation density values of thecontrols, or based on determining a multiple of the median (MoM). In oneembodiment, the control population is matched in age to the testsubject. The age matching does not need to be exact and can be performedin age bands (e.g. 30 to 40 years, for a test subject of 35 years).

We next compared the methylation densities of 1 Mb bins between theplasma samples of the cancer patient and the four control subjects. Forillustration purpose, the results of chromosome 1 are shown.

FIG. 24A is a plot 2400 showing the methylation densities of thepre-operative plasma from the HCC patient. FIG. 24B is a plot 2450showing the methylation densities of the post-operative plasma from theHCC patient. The blue dots represent the results of the controlsubjects, the red dots represent the results of the plasma sample of theHCC patient.

As shown in FIG. 24A, the methylation densities of the pre-operativeplasma from the HCC patient were lower than those of the controlsubjects for most of the bins. Similar patterns were observed for otherchromosomes. As shown in FIG. 24B, the methylation densities of thepost-operative plasma from the HCC patient were similar to those of thecontrol subjects for most of the bins. Similar patterns were observedfor other chromosomes.

To assess if a tested subject is having cancer, the result of the testedsubject would be compared to the values of a reference group. In oneembodiment, the reference group can comprise of a number of healthysubjects. In another embodiment, the reference group can comprise ofsubjects with non-malignant conditions, for example, chronic hepatitis Binfection or cirrhosis. The difference in the methylation densitiesbetween the tested subject and the reference group can then bequantified.

In one embodiment, a reference range can be derived from the values ofthe control group. Then deviations in the result of the tested subjectfrom the upper or lower limits of the reference group can be used todetermine if the subject has a tumor. This quantity would be affected bythe fractional concentration of tumor-derived DNA in the plasma and thedifference in the level of methylation between malignant andnon-malignant tissues. Higher fractional concentration of tumor-derivedDNA in plasma would lead to larger methylation density differencesbetween the test plasma sample and the controls. A larger degree ofdifference in the methylation level of the malignant and non-malignanttissues are also associated with larger methylation density differencesbetween the test plasma sample and the controls. In yet anotherembodiment, different reference groups are chosen for test subjects ofdifferent age ranges.

In another embodiment, the mean and SD of the methylation densities ofthe four control subjects were calculated for each 1 Mb bin. Then forcorresponding bins, the difference between the methylation densities ofthe HCC patient and the mean value of the control subjects wascalculated. In one embodiment, this difference was then divided by theSD of the corresponding bin to determine the z-score. In other words,the z-score represents the difference in methylation densities betweenthe test and control plasma samples expressed as a number of SDs fromthe mean of the control subjects. A z-score >3 of a bin indicates thatthe plasma DNA of the HCC patient is more hypermethylated than thecontrol subjects by more than 3 SDs in that bin whereas a z-score of <−3in a bin indicates that the plasma DNA of the HCC patient is morehypomethylated than the control subjects by more than 3 SDs in that bin.

FIGS. 25A and 25B show z-scores of the plasma DNA methylation densitiesfor the pre-operative (plot 2500) and post-operative (plot 2550) plasmasamples of the HCC patient using the plasma methylome data of the fourhealthy control subjects as reference for chromosome 1. Each dotrepresents the result of one 1 Mb bin. The black dots represent the binswith z-score between −3 and 3. Red dots represent bins with z-score <−3.

FIG. 26A is a table 2600 showing data for z-scores for pre-operative andpost-operative plasma. Most of the bins on chromosome 1 (80.9%) in thepre-operative plasma sample had a z-score of <−3 indicating that thepre-operative plasma DNA of the HCC patient was significantly morehypomethylated than that of the control subjects. On the contrary, thenumber of red dots decreased substantially in the post-operative plasmasample (8.3% of the bins on chromosome 1) suggesting that most of thetumor DNA had been removed from the circulation due to surgicalresection of the source of circulating tumor DNA.

FIG. 26B is a Circos plot 2620 showing the z-score of the plasma DNAmethylation densities for the pre-operative and post-operative plasmasamples of the HCC patient using the four healthy control subjects asreference for 1 Mb bins analyzed from all autosomes. The outermost ringshows the ideograms of the human autosomes. The middle ring shows thedata for the pre-operative plasma sample. The innermost ring shows thatdata for the post-operative plasma sample. Each dot represents theresult of one 1 Mb bin. The black dots represent the bins with z-scoresbetween −3 and 3. The red dots represent bins with z-scores <−3. Thegreen dots represent bins with z-scores >3.

FIG. 26C is a table 2640 showing a distribution of the z-scores of the 1Mb bins for the whole genome in both the pre-operative andpost-operative plasma samples of the HCC patient. The results indicatethat the pre-operative plasma DNA of the HCC patient was morehypomethylated than that of the controls for the majority of regions(85.2% of the 1 Mb bins) in the whole genome. On the contrary, majorityof the regions (93.5% of the 1 Mb bins) in the post-operative plasmasample showed no significant hypermethylation or hypomethylationcompared with controls. These data indicate that much of the tumor DNA,mainly hypomethylated in nature for this HCC, was no longer present inthe post-operative plasma sample.

In one embodiment, the number, percentage or proportion of bins withz-scores <−3 can be used to indicate if a cancer is present. Forexample, as shown in table 2640, 2330 of the 2734 bins analyzed (85.2%)showed z-scores <−3 in the pre-operative plasma while only 171 of the2734 analyzed bins (6.3%) showed z-scores <−3 in the post-operativeplasma. The data indicated that the tumor DNA load in the pre-operativeplasma was much higher than in the post-operative plasma.

The cutoff values of the number of bins may be determined usingstatistical methods. For example, approximately 0.15% of the bins wouldbe expected to have a z-score of <−3 based on a normal distribution.Therefore, the cutoff number of bins can be 0.15% of the total number ofbins being analyzed. In other words, if a plasma sample from anonpregnant individual shows more than 0.15% of bins with z-scores <−3,there is a source of hypomethylated DNA in plasma, namely cancer. Forexample, 0.15% of the 2734 1 Mb bins that we have analyzed in thisexample is about 4 bins. Using this value as a cutoff, both thepre-operative and post-operative plasma samples contained hypomethylatedtumor-derived DNA, though the amount is much more in the pre-operativeplasma sample than the post-operative plasma sample. For the fourhealthy control subjects, none of the bins showed significanthypermethylation or hypomethylation.

In another embodiment, the cutoff number can be determined by receiveroperator characteristic (ROC) curve analysis by analyzing a number ofcancer patients and individuals without cancer. To further validate thespecificity of this approach, a plasma sample from a patient seekingmedical consultation for a non-malignant condition (C06) was analyzed.1.1% of the bins had a z-score of <−3. In one embodiment, differentthresholds can be used to classify different levels of disease status. Alower percentage threshold can be used to differentiate healthy statusfrom benign conditions and a higher percentage threshold todifferentiate benign conditions from malignancies.

In yet another embodiment, the sum of the z-scores for all the bins canbe used to determine if cancer is present or used for the monitoring ofthe serial changes of the level of plasma DNA methylation. Due to theoverall hypomethylated nature of tumor DNA, the sum of z-scores would belower in plasma collected from an individual with cancer than healthycontrols. The sum of z-scores for the pre- and post-operative plasmasample of the HCC patient were −49843.8 and −3132.13, respectively.

In other embodiments, other methods can be used to survey themethylation level of plasma DNA. For example, the proportion ofmethylated cytosine residues over the total content of cytosine residuescan be determined using mass spectrometry (M. L. Chen et al. 2013 ClinChem; doi: 10.1373/clinchem.2012.193938) or massively parallelsequencing. However, as most of the cytosine residues are not in the CpGdinucleotide context, the proportion of methylated cytosine among totalcytosine residuals would be relatively small when compared tomethylation levels estimated in the context of CpG dinucleotides. Wedetermined the methylation level of the tissue and plasma samplesobtained from the HCC patient as well as the four plasma samplesobtained from the healthy controls. The methylation levels were measuredin the context of CpGs, any cytosines, in CHG and CHH contexts using thegenome-wide massively parallel sequencing data. H refers to adenine,thymine or cytosine residues.

FIG. 26D is a table 2660 showing the methylation levels of the tumortissue and pre-operative plasma sample overlapped with some of thecontrol plasma samples when using the CHH and CHG contexts. Themethylation levels of the tumor tissue and pre-operative plasma samplewere consistently lower when compared with the buffy coat, non-tumorliver tissue, post-operative plasma sample and healthy control plasmasamples in both among the CpGs and unspecified cytosines. However, thedata based on the methylated CpGs, i.e. methylation densities, showed awider dynamic range than the data based on the methylated cytosines.

In other embodiments, the methylation status of the plasma DNA can bedetermined by methods using antibodies against methylated cytosine, forexample, methylated DNA immunoprecipitation (MeDIP). However, theprecision of these methods are expected to be inferior tosequencing-based methods because of the variability in antibody binding.In yet another embodiment, the level of 5-hydroxymethylcytosine inplasma DNA can be determined. In this regard, a reduction in the levelof 5-hydroxymethylcytosine has been found to be an epigenetic feature ofcertain cancer, e.g. melanoma (C. G. Lian, et al. 2012 Cell; 150:1135-1146).

In addition to HCC, we also investigated if this approach could beapplied to other types of cancers. We analyzed the plasma samples from 2patients with adenocarcinoma of the lung (CL1 and CL2), 2 patients withnasopharyngeal carcinoma (NPC1 and NPC2), 2 patients with colorectalcancer (CRC1 and CRC2), 1 patient with metastatic neuroendocrine tumor(NE1) and 1 patient with metastatic smooth muscle sarcoma (SMS1). Theplasma DNA of these subjects was bisulfite-converted and sequenced usingthe Illumina HiSeq2000 platform for 50 bp at one end. The four healthycontrol subjects mentioned above were used as a reference group for theanalysis of these 8 patients. 50 bp of the sequence reads at one endwere used. The whole genome was divided into 1 Mb bins. The mean and SDof methylation density were calculated for each bin using the data fromthe reference group. Then the results of the 8 cancer patients wereexpressed as z-scores which represent the number of SDs from the mean ofthe reference group. A positive value indicates that the methylationdensity of the test case is lower than the mean of the reference group,and vice versa. The number of sequence reads and the sequencing depthachieved per sample are shown in table 2780 of FIG. 27I.

FIG. 27A-H show Circos plots of methylation density of 8 cancer patientsaccording to embodiments of the present invention. Each dot representsthe result of a 1 Mb bin. The black dots represent the bins withz-scores between −3 and 3. The red dots represent bins with z-scores<−3. The green dots represent bins with z-scores >3. The intervalbetween two consecutive lines represents a z-score difference of 20.

Significant hypomethylation was observed in multiple regions across thegenomes for patients with most types of cancers, including lung cancer,nasopharyngeal carcinoma, colorectal cancer and metastaticneuroendocrine tumor. Interestingly, in addition to hypomethylation,significant hypermethylation was observed in multiple regions across thegenome in the case with metastatic smooth muscle sarcoma. The embryonicorigin of the smooth muscle sarcoma is the mesoderm whereas theembryonic origin of the other types of cancers in the remaining 7patients is the ectoderm. Therefore, it is possible that the DNAmethylation pattern of sarcoma may be different from that of carcinoma.

As can be seen from this case, the methylation pattern of plasma DNA canalso be useful for differentiating different types of cancer, which inthis example is a differentiation of carcinoma and sarcoma. These dataalso suggest that the approach could be used to detect aberranthypermethylation associated with the malignancy. For all these 8 cases,only plasma samples were available and no tumor tissue had beenanalyzed. This showed that even without the prior methylation profile ormethylation levels of the tumor tissue, tumor-derived DNA can be readilydetected in plasma using the methods described.

FIG. 27J is a table 2790 is a table showing a distribution of thez-scores of the 1 Mb bins for the whole genome in plasma of patientswith different malignancies. The percentages of bins with z-score <−3,−3 to 3 and >3 are shown for each case. More than 5% of the bins had az-score of <−3 for all the cases. Therefore, if we use a cutoff of 5% ofthe bins being significantly hypomethylated for classifying a samplebeing positive for cancer, then all of these cases would be classifiedas positive for cancer. Our results show that hypomethylation is likelyto be a general phenomenon for different types of cancers and the plasmamethylome analysis would be useful for detecting different types ofcancers.

D. Method

FIG. 28 is a flowchart of method 2800 of analyzing a biological sampleof an organism to determine a classification of a level of canceraccording to embodiments of the present invention. The biological sampleincludes DNA originating from normal cells and may potentially includeDNA from cells associated with cancer. At least some of the DNA may becell-free in the biological sample.

At block 2810, a plurality of DNA molecules from the biological sampleare analyzed. The analysis of a DNA molecule can include determining alocation of the DNA molecule in a genome of the organism and determiningwhether the DNA molecule is methylated at one or more sites. Theanalysis can be performed by receiving sequence reads from amethylation-aware sequencing, and thus the analysis can be performedjust on data previously obtained from the DNA. In other embodiments, theanalysis can include the actual sequencing or other active steps ofobtaining the data.

At block 2820, a respective number of DNA molecules that are methylatedat the site is determined for each of a plurality of sites. In oneembodiment, the sites are CpG sites, and may be only certain CpG sites,as selected using one or more criteria mentioned herein. The number ofDNA that are methylated is equivalent to determining the number that areunmethylated once normalization is performed using a total number of DNAmolecules analyzed at a particular site, e.g., a total number ofsequence reads. For example, an increase in the CpG methylation densityof a region is equivalent to a decrease in the density of unmethylatedCpGs of the same region.

At block 2830, a first methylation level is calculated based on therespective numbers of DNA molecules methylated at the plurality ofsites. The first methylation level can correspond to a methylationdensity that is determined based on the number of DNA moleculescorresponding to the plurality of sites. The sites can correspond to aplurality of loci or just one locus.

At block 2840, the first methylation level is compared to a first cutoffvalue. The first cutoff value may be a reference methylation level or berelated to a reference methylation level (e.g., a specified distancefrom a normal level). The reference methylation level may be determinedfrom samples of individuals without cancer or from loci or the organismthat are known to not be associated with a cancer of the organism. Thefirst cutoff value may be established from a reference methylation leveldetermined from a previous biological sample of the organism obtainedprevious to the biological sample being tested.

In one embodiment, the first cutoff value is a specified distance (e.g.,a specified number of standard deviations) from a reference methylationlevel established from a biological sample obtained from a healthyorganism. The comparison can be performed by determining a differencebetween the first methylation level and a reference methylation level,and then comparing the difference to a threshold corresponding to thefirst cutoff value (e.g., to determine if the methylation level isstatistically different than the reference methylation level).

At block 2850, a classification of a level of cancer is determined basedon the comparison. Examples of a level of cancer includes whether thesubject has cancer or a premalignant condition, or an increasedlikelihood of developing cancer. In one embodiment, the first cutoffvalue may be determined from a previously obtained sample from thesubject (e.g., a reference methylation level may be determined from theprevious sample).

In some embodiments, the first methylation level can correspond to anumber of regions whose methylation levels exceed a threshold value. Forexample, a plurality of regions of a genome of the organism can beidentified. The regions can be identified using criteria mentionedherein, e.g., of certain lengths or certain number of sites. One or moresites (e.g., CpG sites) can be identified within each of the regions. Aregion methylation level can be calculated for each region. The firstmethylation level is for a first region. Each of the region methylationlevels are compared to a respective region cutoff value, which may bethe same or vary among regions. The region cutoff value for the firstregion is the first cutoff value. The respective region cutoff valuescan be a specified amount (e.g., 0.5) from a reference methylationlevel, thereby counting only regions that have a significant differencefrom a reference, which may be determined from non-cancer subjects.

A first number of regions whose region methylation level exceeds therespective region cutoff value can be determined, and compared to athreshold value to determine the classification. In one implementation,the threshold value is a percentage. Comparing the first number to athreshold value can include dividing the first number of regions by asecond number of regions (e.g., all of the regions) before comparing tothe threshold value, e.g., as part of a normalization process.

As described above, a fractional concentration of tumor DNA in thebiological sample can be used to calculate the first cutoff value. Thefractional concentration can simply be estimated to be greater than aminimum value, where as a sample with less can be flagged, e.g., as notbeing suitable for analysis. The minimum value can be determined basedon an expected difference in methylation levels for a tumor relative toa reference methylation level. For example, if a difference if 0.5(e.g., as used as a cutoff value), then a certain tumor concentrationwould be required to be high enough to see this difference.

Specific techniques from method 1300 can be applied for method 2800. Inmethod 1300, copy number variations can be determined for a tumor (e.g.,where the first chromosomal region of a tumor can be tested for having acopy number change relative to a second chromosomal region of thetumor). Thus, method 1300 can presume that a tumor exists. In method2800, a sample can be tested for whether there is an indication of anytumor to exist at all, regardless of any copy number characteristics.Some techniques of the two methods can be similar. However, the cutoffvalues and methylation parameters (e.g., normalized methylation levels)for method 2800 can detect a statistical difference from a referencemethylation level for non-cancer DNA as opposed to a difference from areference methylation level for a mixture of cancer DNA and non-cancerDNA with some regions possibly having copy number variations. Thus, thereference values for method 2800 can be determined from samples withoutcancer, such as from organisms without cancer or from non-cancer tissueof the same patient (e.g., plasma taken previously or fromcontemporaneously acquired samples that are known to not have cancer,which may be determined from cellular DNA).

E. Prediction of the Minimal Fractional Concentration of Tumor-DNA to beDetected Using Plasma DNA Methylation Analysis

One way to measure the sensitivity of the approach to detect cancerusing the methylation level of plasma DNA is related to the minimalfractional tumor-derived DNA concentration that is required to reveal achange in plasma DNA methylation level when compared with those ofcontrols. The test sensitivity is also dependent on the extent ofdifference in DNA methylation between the tumor tissue and baselineplasma DNA methylations levels in healthy controls or blood cell DNA.Blood cells are the predominant source of DNA in plasma of healthyindividuals. The larger the difference, the easier the cancer patientscan be discriminated from the non-cancer individuals and would bereflected as a lower detection limit of tumor-derived in plasma and ahigher clinical sensitivity in detecting the cancer patients. Inaddition, the variations in the plasma DNA methylation in the healthysubjects or in subjects with different ages (G. Hannum et al. 2013 MolCell; 49: 359-367) would also affect the sensitivity of detecting themethylation changes associated with the presence of a cancer. A smallervariation in the plasma DNA methylation in the healthy subjects wouldmake the detection of the change caused by the presence of a smallamount of cancer-derived DNA easier.

FIG. 29A is a plot 2900 showing the distribution of the methylationdensities in reference subjects assuming that this distribution followsa normal distribution. This analysis is based on each plasma sample onlyprovides one methylation density value, for example, the methylationdensity of all autosomes or of a particular chromosome. It illustrateshow the specificity of the analysis would be affected. In oneembodiment, a cutoff of 3 SDs below the mean DNA methylation density ofthe reference subjects is used to determine if a tested sample issignificantly more hypomethylated than samples from the referencesubjects. When this cutoff is used, it is expected that approximately0.15% of non-cancer subjects would have false-positive results of beingclassified as having cancer resulting in a specificity of 99.85%.

FIG. 29B is a plot 2950 showing the distributions of methylationdensities in reference subjects and cancer patients. The cutoff value is3 SDs below the mean of the methylation densities of the referencesubjects. If the mean of methylation densities of the cancer patients is2 SDs below the cutoff value (i.e. 5 SDs below the mean of the referencesubjects), 97.5% of the cancer subjects would be expected to have amethylation density below the cutoff value. In other words, the expectedsensitivity would be 97.5% if one methylation density value is providedfor each subject, for example when the total methylation density of thewhole genome, of all autosomes or a particular chromosome is analyzed.The difference between the mean methylation densities of the twopopulations is affected by two factors, namely the degree of differencein the methylation level between cancer and non-cancer tissues and thefractional concentration of tumor-derived DNA in the plasma sample. Thehigher the values of these two parameters, the higher the difference invalue of the methylation densities of these two populations would be. Inaddition, the lower is the SD of the distributions of methylationdensities of the two populations, the lesser is the overlapping of thedistributions of the methylation densities of the two populations.

Here we use a hypothetical example to illustrate this concept. Let'sassume that the methylation density of the tumor tissue is approximately0.45 and that of the plasma DNA of the healthy subjects is approximately0.7. These assumed values are similar to those obtained from our HCCpatient where the overall methylation density of the autosomes is 42.9%and the mean methylation density of the autosomes for the plasma samplesfrom healthy controls was 71.6% Assuming that the CV of measuring theplasma DNA methylation density for the whole genome is 1%, the cutoffvalue would be 0.7×(100%−3×1%)=0.679. To achieve a sensitivity of 97.5%,the mean methylation density of the plasma DNA for the cancer patientsneed to be approximately 0.679−0.7×(2×1%)=0.665. Let f represents thefractional concentration of tumor-derived DNA in the plasma sample. Thenf can be calculated as (0.7−0.45)×f=0.7−0.665. Therefore, f isapproximately 14%. From this calculation, it is estimated that theminimal fractional concentration that can be detected in the plasma is14% so as to achieve a diagnostic sensitivity of 97.5% if the totalmethylation density of the whole genome is used as the diagnosticparameter.

Next we performed this analysis on the data obtained from the HCCpatient. For this illustration, only one methylation density measurementbased on the value estimated from all autosomes was made for eachsample. The mean methylation density was 71.6% among the plasma samplesobtained from the healthy subjects. The SD of the methylation densitiesof these four samples was 0.631%. Therefore, the cutoff value for plasmamethylation density would need to be 71.6%−3×0.631%=69.7% to reach az-score <−3 and a specificity of 99.85%. To achieve a sensitivity of a97.5%, the mean plasma methylation density of the cancer patients wouldneed to be 2 SDs below the cutoff, i.e. 68.4%. Since the methylationdensity of the tumor tissue was 42.9% and using the formula:P=BKG×(1−f)+TUM×f, f would need to be at least 11.1%.

In another embodiment, the methylation densities of different genomicregions can be analyzed separately as shown in FIG. 3 or 4. In otherwords, multiple measurements of the methylation level were made for eachsample. As shown below, significant hypomethylation could be detected atmuch lower fractional tumor DNA concentration in plasma and thus thediagnostic performance of the plasma DNA methylation analysis for cancerdetection would be enhanced. The number of genomic regions showing asignificant deviation in methylation densities from the referencepopulation can be counted. Then the number of genomic regions can becompared to a cutoff value to determine if there is an overallsignificant hypomethylation of plasma DNA across the population ofgenomic regions surveyed, for example, the 1 Mb bins of the wholegenome. The cutoff value can be established by the analysis of a groupof reference subjects without a cancer or derived mathematically, forexample, according to normal distribution function.

FIG. 30 is a plot 3000 showing the distribution of methylation densitiesof the plasma DNA of healthy subjects and cancer patients. Themethylation density of each 1 Mb bin is compared with the correspondingvalues of the reference group. The percentage of bins showingsignificant hypomethylation (3 SDs below the mean of the referencegroup) was determined. A cutoff of 10% being significantlyhypomethylated was used to determine if tumor-derived DNA is present inthe plasma sample. Other cutoff values such as 5%, 15%, 20%, 25%, 30%,35%, 40%, 45%, 50%, 60%, 70%, 80% or 90% can also be used according tothe desired sensitivity and specificity of the test.

For example, to classify a sample as containing tumor-derived DNA, wecan use 10% of the 1 Mb bins showing significant hypomethylation(z-score <−3) as a cutoff. If there are more than 10% of the bins beingsignificantly more hypomethylated than the reference group, then thesample is classified as positive for the cancer test. For each 1 Mb bin,a cutoff of 3 SDs below the mean methylation density of the referencegroup is used to define a sample as significantly more hypomethylated.For each of the 1 Mb bins, if the mean plasma DNA methylation density ofthe cancer patients is 1.72 SDs lower than the mean plasma DNAmethylation densities of the reference subjects, then there is a 10%chance that the methylation density value of any particular bin of acancer patient would be lower than the cutoff (i.e. z-score <−3) andgives a positive result. Then, if we look at all the 1 Mb bins for thewhole genome, then approximately 10% of the bins would be expected toshow positive results of having significantly lower methylationdensities (i.e. z-scores <−3). Assuming that the overall methylationdensity of the plasma DNA of the healthy subjects is approximately 0.7and the coefficient of variation (CV) of measuring the plasma DNAmethylation density for each 1 Mb bin is 1%, the mean methylationdensity of the plasma DNA of the cancer patients would need to be0.7×(100%−1.72×1%)=0.68796. Let f be the fractional concentration oftumor-derived DNA in plasma so as to achieve this mean plasma DNAmethylation density. Assuming that the methylation density of the tumortissue is 0.45, then f can be calculated using the equation

(M _(P) _(ref) −M_(tumor))×f=M _(P) _(ref) −M _(P) _(cancer)

where M _(P) _(ref) represents the mean methylation density of plasmaDNA in the reference individuals; M_(tumor) represents the methylationdensity of the tumor tissue in the cancer patient; and M _(P) _(cancer)represents the mean methylation density of plasma DNA in the cancerpatients

Using this equation, (0.7−0.45)×f=0.7−0.68796. Thus, the minimalfractional concentration can be detected using this approach would bededuced as 4.8%. The sensitivity can be further enhanced by decreasingthe cutoff percentage of bins being significantly more hypomethylated,for example, from 10% to 5%.

As shown in the above example, the sensitivity of this method isdetermined by the degree of difference in methylation level betweencancer and non-cancer tissues, for example, blood cells. In oneembodiment, only the chromosomal regions which show a large differencein methylation densities between the plasma DNA of the non-cancersubjects and the tumor tissue are selected. In one embodiment, onlyregions with a difference in methylation density of >0.5. In otherembodiments a difference of 0.4, 0.6, 0.7, 0.8 or 0.9 can be used forselecting the suitable regions. In yet another embodiments, the physicalsize of the genomic regions is not fixed. Instead, the genomic regionsare defined, for example, based on a fixed read depth or a fixed numberof CpG sites. The methylation levels at a multiple of these genomicregions are assessed for each sample.

FIG. 31 is a graph 3100 showing the distribution of the differences inmethylation densities between the mean of the plasma DNA of healthysubjects and the tumor tissue of the HCC patient. A positive valuesignifies that the methylation density is higher in the plasma DNA ofthe healthy subjects and a negative value signifies that the methylationdensity is higher in the tumor tissue.

In one embodiment, the bins with the greatest difference between themethylation density of the cancer and non-cancer tissues can beselected, for example, those with a difference of >0.5, regardless ofwhether the tumor is hypomethylated or hypermethylated for these bins.The detection limit of fractional concentration of tumor-derived DNA inplasma can be lowered by focusing on these bins because of the greaterdifferences between the distributions of the plasma DNA methylationlevels between cancer and non-cancer subjects given the same fractionalconcentration of tumor-derived DNA in the plasma. For example, if onlybins with differences >0.5 are used and a cutoff of 10% of the binsbeing significantly more hypomethylated is adopted to determine if atested individual has a cancer, the minimal fractional concentration (f)of tumor derived DNA detected can be calculated using the followingequation:

(M _(P) _(ref) −M_(tumor))×f=M _(P) _(ref) −M _(P) _(cancer) ,

where M _(P) _(ref) represents the mean methylation density of plasmaDNA in the reference individuals; M_(tumor) represents the methylationdensity of the tumor tissue in the cancer patient; and M _(P) _(cancer)represents the mean methylation density of plasma DNA in the cancerpatients.

While the difference in methylation density between the plasma of thereference subjects and the tumor tissues is at least 0.5. Then, we have0.5×f=0.7−0.68796 and f=2.4%. Therefore, by focusing on bins with ahigher difference in methylation density between cancer and non-cancertissues, the lower limit of fractional tumor-derived DNA can be loweredfrom 4.8% to 2.4%. The information regarding which bins would showlarger degrees of methylation differences between cancer and non-cancertissues, for example, blood cells, could be determined from tumortissues of the same organ or same histological type obtained from otherindividuals.

In another embodiment, a parameter can be derived from the methylationdensity of the plasma DNA of all bins and taking into account thedifference in methylation densities between cancer and non-cancertissues. Bins with greater difference can be given a heavier weight. Inone embodiment, the difference in methylation density between cancer andnon-cancer tissue of each bin can directly be used as the weight if theparticular bin in calculating the final parameter.

In yet another embodiment, different types of cancer may have differentpatterns of methylation in the tumor tissue. A cancer-specific weightprofile can be derived from the degree of methylation of the specifictype of cancer.

In yet another embodiment, the inter-bin relationship of methylationdensity can be determined in subjects with and without cancer. In FIG.8, we can observe that in a small number of bins, the tumor tissues weremore methylated than the plasma DNA of the reference subjects. Thus, thebins with the most extreme values of difference, e.g. difference >0.5and difference <0, can be selected. The ratio of the methylation densityof these bins can then be used to indicate if the tested individual hasa cancer. In other embodiments, the difference and quotient of themethylation density of different bins can be used as parameters forindicating the inter-bin relationship.

We further assessed the detection sensitivity of the approach to detector assess tumor using the methylation densities of multiple genomicregions as illustrated by the data obtained from the HCC patient. First,we mixed reads from the pre-operative plasma with those obtained fromthe plasma samples of the healthy controls to simulate plasma samplesthat contained fractional tumor DNA that ranged from 20% to 0.5%. Wethen scored the percentage of 1 Mb bins (out of 2,734 bins in the wholegenome) with methylation densities equivalent to z-scores <−3. When thefractional tumor DNA concentration in plasma was 20%, 80.0% of the binsshowed significant hypomethylation. The corresponding data forfractional tumor DNA concentration in plasma of 10%, 5%, 2%, 1% and 0.5%were 67.6%, 49.7%, 18.9%, 3.8% and 0.77% of the bins showinghypomethylation, respectively. Since the theoretical limit of the numberof bins showing z-scores <−3 in the controls samples is 0.15%, our datashow that there were still more bins (0.77%) beyond the theoreticalcutoff limit even when the tumor fractional concentration was just 0.5%.

FIG. 32A is a table 3200 showing the effect of reducing the sequencingdepth when the plasma sample contained 5% or 2% tumor DNA. A highproportion of bins (>0.15%) showing significant hypomethylation couldstill be detected when the mean sequencing depth was just 0.022 timesthe haploid genome.

FIG. 32B is a graph 3250 showing the methylation densities of the repeatelements and non-repeat regions in the plasma of the four healthycontrol subjects, the buffy coat, the normal liver tissue, the tumortissue, the pre-operative plasma and the post-operative plasma samplesof the HCC patient. It can be observed that the repeat elements weremore methylated (higher methylation density) than the non-repeat regionsin both cancer and non-cancer tissues. However, the difference inmethylation between repeat elements and non-repeat regions was bigger inthe non-cancer tissues and the plasma DNA of the healthy subjects whencompared with the tumor tissues.

As a result, the plasma DNA of the cancer patient had a larger reductionin methylation density at the repeat elements than in the non-repeatregions. The difference in plasma DNA methylation density between themean of the four healthy controls and the HCC patient was 0.163 and0.088 for the repeat elements and the non-repeat regions, respectively.The data on the pre-operative and post-operative plasma samples alsoshowed that the dynamic range in the change in methylation density waslarger in the repeat than the non-repeat regions. In one embodiment, theplasma DNA methylation density of the repeat elements can be used fordetermining if a patient is affected by cancer or the monitoring of thedisease progression.

As discussed above, the variation in methylation densities in the plasmaof the reference subjects would also affect the accuracy ofdifferentiating cancer patients from non-cancer individuals. The tighterthe distribution of methylation densities (i.e. smaller standarddeviation), the more accurate it would be to differentiate cancer andnon-cancer subjects. In another embodiment, the coefficient of variation(CV) of the methylation densities of the 1 Mb bins can be used as acriterion for selecting the bins with low variability of plasma DNAmethylation densities in the reference group. For example, only binswith CV<1% are selected. Other values, for example 0.5%, 0.75%, 1.25%and 1.5% can also be used as criteria for selecting the bins with lowvariability in methylation density. In yet another embodiment, theselection criteria can include both the CV of the bin and the differencein methylation density between cancer and non-cancer tissues.

The methylation density can also be used to estimate the fractionalconcentration of tumor-derived DNA in a plasma sample when themethylation density of the tumor tissue is known. This information canbe obtained by the analysis of the tumor of the patient or from thesurvey of the tumors from a number of patients having the same type ofcancer. As discussed above, the plasma methylation density (P) can beexpressed using the following equation: P=BKG×(1−f)+TUM×f where BKG isthe background methylation density from the blood cells and otherorgans, TUM is the methylation density in the tumor tissue, and f is thefractional concentration of tumor-derived DNA in the plasma sample. Thiscan be rewritten as:

$f = {\frac{{BKG} - P}{{BKG} - {TUM}}.}$

The values of BKG can be determined by analyzing the patient's plasmasample at a time point that the cancer is not present or from the surveyof a reference group of individuals without cancer. Therefore, aftermeasuring the plasma methylation density, f can be determined.

F. Combination

The methylation analysis approach described in this invention can beused in combination with other methods that are based on the geneticchanges of tumor-derived DNA in plasma. Examples of such methods includethe analysis for cancer-associated chromosomal aberrations (K. C. Chanet al. 2013 Clin Chem; 59:211-224; R. J. Leary et al. 2012 Sci TranslMed; 4:162ra154) and cancer-associated single nucleotide variations inplasma (K. C. Chan et al. 2013 Clin Chem; 59:211-224). There areadvantages of the methylation analysis approach over those geneticapproaches.

As shown in FIG. 21A, the hypomethylation of the tumor DNA is a globalphenomenon involving regions distributed across almost the entiregenome. Therefore, the DNA fragments from all chromosomal regions wouldbe informative regarding the potential contribution of the tumor-derivedhypomethylated DNA to the plasma/serum DNA in the patient. In contrast,chromosomal aberrations (either amplification or deletion of achromosomal region) are only present in some chromosomal and the DNAfragments from the regions without a chromosome aberration in the tumortissue would not be informative in the analysis (K. C. Chan et al. 2013Clin Chem; 59: 211-224). Similarly only a few thousand of singlenucleotide alterations are observed in each cancer genome (K. C. Chan etal. 2013 Clin Chem; 59: 211-224). DNA fragments that do not overlap withthese single nucleotide changes would not be informative in determiningif tumor-derived DNA is present in the plasma. Therefore, thismethylation analysis approach is potentially more cost-effective thanthose genetic approaches for detecting cancer-associated changes in thecirculation.

In one embodiment, the cost-effectiveness of plasma DNA methylationanalysis can further be enhanced by enriching for DNA fragments from themost informative regions, for example regions with highest differentialmethylation difference between cancer and non-cancer tissues. Examplesfor the methods of enriching for these regions include the use ofhybridization probes (e.g. Nimblegen SeqCap system and AgilentSureSelect Target Enrichment system), PCR amplification and solid phasehybridization (e.g. Illumina TruSeq Enrichment kit).

G. Tissue-Specific Analysis/Donors

Tumor-derived cells invade and metastasize to adjacent or distantorgans. The invaded tissues or metastatic foci contribute DNA intoplasma as a result of cell death. By analyzing the methylation profileof DNA in the plasma of cancer patients and detecting the presence oftissue-specific methylation signatures, one could detect the types oftissues that are involved in the disease process. This approach providesa noninvasive anatomic scan of the tissues involved in the cancerousprocess to aid in the identification of the organs involved as theprimary and metastatic sites. Monitoring the relative concentrations ofthe methylation signatures of the involved organs in plasma would alsoallow one to assess the tumor burden of those organs and determine ifthe cancer process in that organ is deteriorating or improving or hadbeen cured. For example, if a gene X is specifically methylated in theliver. Then, metastatic involvement of the liver by a cancer (e.g.colorectal cancer) will be expected to increase the concentration ofmethylated sequences from gene X in the plasma. There would also beanother sequence or groups of sequences with similar methylationcharacteristics as gene X. One could then combine the results from suchsequences. Similar considerations are applicable to other tissues, e.g.the brain, bones, lungs and kidneys, etc.

On the other hand, DNA from different organs is known to exhibittissue-specific methylation signatures (B. W. Futscher et al. 2002 NatGenet; 31:175-179; S. S. C. Chim et al. 2008 Clin Chem; 54: 500-511).Thus, methylation profiling in plasma can be used for elucidating thecontribution of tissues from various organs into plasma. The elucidationof such contribution can be used for assessing organ damage, as plasmaDNA is believed to be released when cells die. For example, liverpathology such as hepatitis (e.g. by viruses, autoimmune processes, etc)or hepatoxicity (e.g. drug overdose (such as by paracetamol) or toxins(such as alcohol) caused by drugs is associated with liver cell damageand will be expected to be associated with increased level ofliver-derived DNA in plasma. For example, if a gene X is specificallymethylated in the liver. Then, liver pathology will be expected toincrease the concentration of methylated sequences from gene X in theplasma. Conversely, if a gene Y is specifically hypomethylated in theliver. Then, liver pathology will be expected to decrease theconcentration of methylated sequences from gene Y in the plasma.

The present described approach could also be applied to the assessmentof donor-derived DNA in the plasma of organ transplantation recipients(Y. M. D. Lo et al. 1998 Lancet; 351:1329-1330). Polymorphic differencesbetween the donor and recipient had been used to distinguish thedonor-derived DNA from the recipient-derived DNA in plasma (Y. W. Zhenget al. 2012 Clin Chem; 58: 549-558). We propose that tissue-specificmethylation signatures of the transplanted organ could also be used as amethod to detect the donor's DNA in the recipient's plasma.

By monitoring the concentration of the donor's DNA, one couldnoninvasively assess the status of the transplanted organ. For example,transplant rejection is associated with higher rate of cell death andhence the concentration of the donor's DNA, as reflected by themethylation signature of the transplanted organ, would be increased whencompared with the time when the patient is in stable condition or whencompared to other stable transplant recipients or healthy controlswithout transplantation. Similar to what has been described for cancer,the donor-derived DNA could be identified in the plasma oftransplantation recipients by detecting for all or some of thecharacteristic features, including polymorphic differences, shorter sizeDNA for the transplanted solid organs (Y. W. Zheng et al. 2012 ClinChem; 58: 549-558) and tissue-specific methylation profile.

IX. Materials and Methods

A. Preparation of Bisulfite-Treated DNA Libraries and Sequencing

Genomic DNA (5 μg) added with 0.5% (w/w) unmethylated lambda DNA(Promega) was fragmented by a Covaris S220 System (Covaris) toapproximately 200 bp in length. DNA libraries were prepared using thePaired-End Sequencing Sample Preparation Kit (Illumina) according to themanufacturer's instructions, except that methylated adapters (Illumina)were ligated to the DNA fragments. Following two rounds of purificationusing AMPure XP magnetic beads (Beckman Coulter), the ligation productswere split into 2 portions, one of which was subjected to 2 rounds ofbisulfite modification with an EpiTect Bisulfite Kit (Qiagen).Unmethylated cytosines at CpG sites in the inserts were converted touracils while the methylated cytosines remained unchanged. Theadapter-ligated DNA molecules, either treated or untreated with sodiumbisulfite, were enriched by 10 cycles of PCR using the following recipe:2.5U PfuTurboCx hotstart DNA polymerase (Agilent Technologies), 1×PfuTurboCx reaction buffer, 25 μM dNTPs, 1 μl PCR Primer PE 1.0 and 1 μlPCR Primer PE 2.0 (Illumina) in a 50 μl-reaction. The thermocyclingprofile was: 95° C. for 2 min, 98° C. for 30 s, then 10 cycles of 98° C.for 15 s, 60° C. for 30 s and 72° C. for 4 min, with a final step of 72°C. for 10 min (R. Lister, et al. 2009 Nature 462, 315-322). The PCRproducts were purified using AMPure XP magnetic beads.

Plasma DNA extracted from 3.2-4 ml of maternal plasma samples was spikedwith fragmented lambda DNA (25 pg per ml plasma) and subjected tolibrary construction as described above (R. W. K. Chiu et al. 2011 BMJ;342: c7401). After ligating to the methylated adapters, the ligationproducts were split into 2 halves and a portion was subjected to 2rounds of bisulfite modification. The bisulfite-treated or untreatedligation products were then enriched by 10 cycles of PCR as describedabove.

Bisulfite-treated or untreated DNA libraries were sequenced for 75 bp ina paired-end format on HiSeq2000 instruments (Illumina). DNA clusterswere generated with a Paired-End Cluster Generation Kit v3 on a cBotinstrument (Illumina). Real-time image analysis and base calling wereperformed using the HiSeq Control Software (HCS) v1.4 and Real TimeAnalysis (RTA) Software v1.13 (Illumina), by which the automated matrixand phasing calculations were based on the spiked-in PhiX control v3sequenced with the DNA libraries.

B. Sequence Alignment and Identification of Methylated Cytosines

After base calling, adapter sequences and low quality bases (i.e.quality score <20) on the fragment ends were removed. The trimmed readsin FASTQ format were then processed by a methylation data analysispipeline called Methy-Pipe (P. Jiang, et al. Methy-Pipe: An integratedbioinformatics data analysis pipeline for whole genome methylomeanalysis, paper presented at the IEEE International Conference onBioinformatics and Biomedicine Workshops, Hong Kong, 18 to 21 Dec.2010). In order to align the bisulfite converted sequencing reads, wefirst performed in silico conversion of all cytosine residues tothymines, on the Watson and Crick strands separately, using thereference human genome (NCBI build 36/hg18). We then performed in silicoconversion of each cytosine to thymine in all the processed reads andkept the position information of each converted residue. SOAP2 (R. Li,et al. 2009 Bioinformatics; 25: 1966-1967) was used to align theconverted reads to the two pre-converted reference human genomes, with amaximum of two mismatches allowed for each aligned read. Only readsmappable to a unique genomic location were selected. Ambiguous readswhich mapped to both the Watson and Crick strands and duplicated(clonal) reads which had the same start and end genomic positions wereremoved. Sequenced reads with insert size ≤600 bp were retained for themethylation and size analyses.

Cytosine residues in the CpG dinucleotide context were the major targetsfor the downstream DNA methylation studies. After alignment, thecytosines originally present on the sequenced reads were recovered basedon the positional information kept during the in silico conversion. Therecovered cytosines among the CpG dinucleotides were scored asmethylated. Thymines among the CpG dinucleotides were scored asunmethylated. The unmethylated lambda DNA included during librarypreparation served as an internal control for estimating the efficiencyof sodium bisulfite modification. All cytosines on the lambda DNA shouldhave been converted to thymines if the bisulfite conversion efficiencywas 100%.

X. Summary

With the use of embodiments described herein, one could screen, detect,monitor or prognosticate cancer noninvasively using for example theplasma of a subject. One could also carry out prenatal screening,diagnosis, investigation or monitoring of a fetus by deducing themethylation profile of fetal DNA from maternal plasma. To illustrate thepower of the approach, we showed that information that wasconventionally obtained via the study of placental tissues could beassessed directly from maternal plasma. For example, the imprintingstatus of gene loci, identification of loci with differentialmethylation between the fetal and maternal DNA and the gestationalvariation in the methylation profile of gene loci were achieved throughthe direct analysis of maternal plasma DNA. The major advantage of ourapproach is that the fetal methylome could be assessed comprehensivelyduring pregnancy without disruption to the pregnancy or the need forinvasive sampling of fetal tissues. Given the known association betweenaltered DNA methylation status and the many pregnancy-associatedconditions, the approach described in this study can serve as animportant tool for investigating the pathophysiology of and theidentification of biomarkers for those conditions. By focusing on theimprinted loci, we showed that both the paternally-transmitted as wellas the maternally-transmitted fetal methylation profiles could beassessed from maternal plasma. This approach may potentially be usefulfor the investigation of imprinting diseases. Embodiments can also beapplied directly for the prenatal assessment of fetal orpregnancy-associated diseases.

This is also the first study where genome-wide bisulfite sequencing hasbeen applied to investigate the DNA methylation profile of placentaltissues. There are approximately 28M CpG sites in the human genome (C.Clark, et al. 2012 PLoS One; 7: e50233). Our bisulfite sequencing dataof the CVS and term placental tissue sample covered more than 80% of theCpGs. This represents a substantially broader coverage than thoseachievable using other high-throughput platforms. For example, theIllumina Infinium HumanMethylation 27K beadchip array that was used in aprevious study on placental tissues (T. Chu, et al. 2011 PLoS One; 6:e14723). only covered 0.1% of the CpGs in the genome. The IlluminaInfinium HumanMethylation 450K beadchip array that was available morerecently only covered 1.7% of the CpGs (C. Clark, et al. 2012). Becausethe MPS approach is free from restrictions related to probe design,hybridization efficiency or strength of antibody capture, CpGs within orbeyond CpG islands and in most sequence contexts could be assessed.

XI. Computer System

Any of the computer systems mentioned herein may utilize any suitablenumber of subsystems. Examples of such subsystems are shown in FIG. 33in computer apparatus 3300. In some embodiments, a computer systemincludes a single computer apparatus, where the subsystems can be thecomponents of the computer apparatus. In other embodiments, a computersystem can include multiple computer apparatuses, each being asubsystem, with internal components.

The subsystems shown in FIG. 33 are interconnected via a system bus3375. Additional subsystems such as a printer 3374, keyboard 3378,storage device(s) 3379, monitor 3376, which is coupled to displayadapter 3382, and others are shown. Peripherals and input/output (I/O)devices, which couple to I/O controller 3371, can be connected to thecomputer system by any number of means known in the art, such as serialport 3377. For example, serial port 3377 or external interface 3381(e.g. Ethernet, Wi-Fi, etc.) can be used to connect computer system 3300to a wide area network such as the Internet, a mouse input device, or ascanner. The interconnection via system bus 3375 allows the centralprocessor 3373 to communicate with each subsystem and to control theexecution of instructions from system memory 3372 or the storagedevice(s) 3379 (e.g., a fixed disk), as well as the exchange ofinformation between subsystems. The system memory 3372 and/or thestorage device(s) 3379 may embody a computer readable medium. Any of thevalues mentioned herein can be output from one component to anothercomponent and can be output to the user.

A computer system can include a plurality of the same components orsubsystems, e.g., connected together by external interface 3381 or by aninternal interface. In some embodiments, computer systems, subsystem, orapparatuses can communicate over a network. In such instances, onecomputer can be considered a client and another computer a server, whereeach can be part of a same computer system. A client and a server caneach include multiple systems, subsystems, or components.

It should be understood that any of the embodiments of the presentinvention can be implemented in the form of control logic using hardware(e.g. an application specific integrated circuit or field programmablegate array) and/or using computer software with a generally programmableprocessor in a modular or integrated manner. As user herein, a processorincludes a multi-core processor on a same integrated chip, or multipleprocessing units on a single circuit board or networked. Based on thedisclosure and teachings provided herein, a person of ordinary skill inthe art will know and appreciate other ways and/or methods to implementembodiments of the present invention using hardware and a combination ofhardware and software.

Any of the software components or functions described in thisapplication may be implemented as software code to be executed by aprocessor using any suitable computer language such as, for example,Java, C++ or Perl using, for example, conventional or object-orientedtechniques. The software code may be stored as a series of instructionsor commands on a computer readable medium for storage and/ortransmission, suitable media include random access memory (RAM), a readonly memory (ROM), a magnetic medium such as a hard-drive or a floppydisk, or an optical medium such as a compact disk (CD) or DVD (digitalversatile disk), flash memory, and the like. The computer readablemedium may be any combination of such storage or transmission devices.

Such programs may also be encoded and transmitted using carrier signalsadapted for transmission via wired, optical, and/or wireless networksconforming to a variety of protocols, including the Internet. As such, acomputer readable medium according to an embodiment of the presentinvention may be created using a data signal encoded with such programs.Computer readable media encoded with the program code may be packagedwith a compatible device or provided separately from other devices(e.g., via Internet download). Any such computer readable medium mayreside on or within a single computer program product (e.g. a harddrive, a CD, or an entire computer system), and may be present on orwithin different computer program products within a system or network. Acomputer system may include a monitor, printer, or other suitabledisplay for providing any of the results mentioned herein to a user.

Any of the methods described herein may be totally or partiallyperformed with a computer system including one or more processors, whichcan be configured to perform the steps. Thus, embodiments can bedirected to computer systems configured to perform the steps of any ofthe methods described herein, potentially with different componentsperforming a respective steps or a respective group of steps. Althoughpresented as numbered steps, steps of methods herein can be performed ata same time or in a different order. Additionally, portions of thesesteps may be used with portions of other steps from other methods. Also,all or portions of a step may be optional. Additionally, any of thesteps of any of the methods can be performed with modules, circuits, orother means for performing these steps.

The specific details of particular embodiments may be combined in anysuitable manner without departing from the spirit and scope ofembodiments of the invention. However, other embodiments of theinvention may be directed to specific embodiments relating to eachindividual aspect, or specific combinations of these individual aspects

The above description of exemplary embodiments of the invention has beenpresented for the purposes of illustration and description. It is notintended to be exhaustive or to limit the invention to the precise formdescribed, and many modifications and variations are possible in lightof the teaching above. The embodiments were chosen and described inorder to best explain the principles of the invention and its practicalapplications to thereby enable others skilled in the art to best utilizethe invention in various embodiments and with various modifications asare suited to the particular use contemplated.

A recitation of “a”, “an” or “the” is intended to mean “one or more”unless specifically indicated to the contrary.

All patents, patent applications, publications, and descriptionsmentioned here are incorporated by reference in their entirety for allpurposes. None is admitted to be prior art.

What is claimed is:
 1. A method of analyzing a biological sample from asubject, the method comprising: (a) obtaining sequence reads forcell-free DNA molecules from the biological sample of the subject,wherein the sequence reads include methylation statuses for thecell-free DNA molecules at single nucleotide resolution, and wherein thesequence reads comprise at least 100,000 sequence reads; (b) analyzing,including aligning to a reference genome, the at least 1000,000 sequencereads to determine a methylation profile for a plurality of sites basedon the methylation statuses for the plurality of sites; and (c)determining a tissue source for at least a portion of the cell-free DNAmolecules from the biological sample based, at least in part, on themethylation profile.
 2. The method of claim 1, wherein determining thetissue source comprises: comparing the methylation profile to one ormore reference methylation profiles.
 3. The method of claim 2, whereinat least one of the one or more reference methylation profiles isdetermined from methylation statuses of one or more reference sampleobtained from another subject known to have cancer.
 4. The method ofclaim 3, wherein at least another one of the one or more referencemethylation profiles is obtained from methylation statuses of at leastone other sample obtained from a healthy subject.
 5. The method of claim2, wherein the comparison of the methylation profile to the one or morereference methylation profiles detects changes in methylation status ofCpG islands.
 6. The method of claim 1, wherein the methylation profilecomprises a pattern of the cell-free DNA molecules that are methylatedat the plurality of sites, wherein the plurality of sites includes atleast 20,000 sites.
 7. The method of claim 1, wherein at least a portionof the cell-free DNA molecules are cancer-derived molecules, the methodfurther comprising determining a type of cancer of the subject based, atleast in part, on the methylation profile.
 8. The method of claim 7,wherein the type of cancer is selected from the group consisting of lungcancer, breast cancer, colorectal cancer, prostate cancer,nasopharyngeal cancer, gastric cancer, testicular cancer, skin cancer,cancer affecting the nervous system, bone cancer, ovarian cancer, livercancer, hematologic malignancies, pancreatic cancer,endometriocarcinoma, and kidney cancer.
 9. The method of claim 1,further comprising sequencing the cell-free DNA molecules to obtain thesequence reads.
 10. The method of claim 9, wherein the sequencingcomprises methylation-aware sequencing.
 11. The method of claim 10,wherein the methylation-aware sequencing comprises bisulfate sequencing.12. The method of claim 10, further comprising enriching the cell-freeDNA molecules before the sequencing, and wherein the enriching comprisesuse of hybridization probes, polymerase chain reaction amplification, orsolid phase hybridization.
 13. The method of claim 1, wherein theplurality of sites comprise one or more CpG sites.
 14. The method ofclaim 13, wherein the one or more CpG sites comprise a plurality of CpGsites that are organized into one or more CpG islands.
 15. The method ofclaim 14, wherein the determining the methylation profile for theplurality of sites comprises, for each CpG island of a plurality of CpGislands, determining a number of sequence reads showing methylation atthe CpG sites in the CpG island.
 16. The method of claim 1, wherein thedetermining the methylation profile for the plurality of sitescomprises, for each site of the plurality of sites, determining a totalnumber of sequence reads at the plurality of sites.
 17. The method ofclaim 1, wherein the analyzing further comprises determining locationsof the cell-free DNA molecules in a genome.
 18. The method of claim 1,wherein the determining the methylation profile comprises determining anumber of sequence reads of the cell-free DNA molecules showingmethylation at sites in a genomic region.
 19. The method of claim 1,wherein the biological sample is selected from a group consisting ofblood, plasma, serum, urine, vaginal fluid, uterine or vaginal flushingfluids, plural fluid, ascitic fluid, cerebrospinal fluid, saliva, sweat,tears, sputum, bronchoalveolar lavage, fluid, and stool.
 20. Anon-transitory computer-readable medium comprising instructions that,upon execution by one or more computer processors of a computer system,causes the computer system to perform a method, the method comprising:(a) obtaining sequence reads for cell-free DNA molecules from abiological sample of a subject, wherein the sequence reads includemethylation statuses for the cell-free DNA molecules at singlenucleotide resolution, and wherein the sequence reads comprise at least100,000 sequence reads; (b) analyzing, including aligning to a referencegenome, the at least 1000,000 sequence reads to determine a methylationprofile for a plurality of sites based on the methylation statuses forthe plurality of sites; and (c) determining a tissue source for at leasta portion of the cell-free DNA molecules from the biological samplebased, at least in part, on the methylation profile.